
# Product Recommendations System

# Outcomes:

* Create a report that identifies the bestselling products per category .
* Derive a system to analyze the least selling products to help inform actions based on unfavorable reviews
* Solve for the problem and host it on your GitHub page. Once done, share your GitHub link with us by emailing ryse.tii@target.com

Name: Tarakeshwari S N

Email: sntarakeshwari@gmail.com

**Method of Solving**:

* We aim to estimate the probability of each review expressing a positive sentiment by using the review text, title, and rating as independent features, and doRecommend as the dependent feature.

* Using the model developed above, we predict the probability that the title of each review indicates a positive sentiment.

* At this stage, we have sentiment scores derived from both reviews and titles, along with the original rating and doRecommend values.

* To identify the best and least selling products, we define a composite metric that evaluates product performance based on these features.

* Metric to evaluate score for product = doRecommend + Rating + reviews_probability + title_probability.

* The final output will include the best and least-selling products, along with their average rating, and their most representative positive or negative review and title. This will provide actionable insights for enhancing product offerings and understanding performance drivers.

In [61]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

import re
import string
import nltk

#import stopwords and text processing libraries
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer
from nltk.stem import WordNetLemmatizer
from nltk.tokenize import word_tokenize
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
nltk.download('wordnet')
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('omw-1.4')

# import sentiment intensity analyzer
from nltk.sentiment.vader import SentimentIntensityAnalyzer
nltk.download('vader_lexicon')

#import machine learning libraries
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import BernoulliNB, MultinomialNB
from sklearn.svm import LinearSVC
from sklearn.ensemble import GradientBoostingClassifier, RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from xgboost import XGBClassifier

from sklearn.model_selection import train_test_split, StratifiedKFold
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer, TfidfTransformer
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

from sklearn.preprocessing import LabelEncoder, StandardScaler
import sklearn.metrics as metrics
from sklearn.compose import ColumnTransformer

[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\sntar\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\sntar\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\sntar\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package omw-1.4 to
[nltk_data]     C:\Users\sntar\AppData\Roaming\nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!
[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\sntar\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


In [62]:
import nltk
nltk.download('punkt_tab')
import string
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import PorterStemmer
from nltk.stem import WordNetLemmatizer

[nltk_data] Downloading package punkt_tab to
[nltk_data]     C:\Users\sntar\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!


**Loading Dataset**

In [63]:
data = pd.read_excel("data.xlsx", engine="openpyxl")


In [64]:
data

Unnamed: 0,product,source,categories,date,didPurchase,doRecommend,rating,reviews,title
0,electronics brand product name Tablet A 10.1 T...,Target,"Electronics,iPad & Tablets,All Tablets,XYZ bra...",2021-01-13T00:00:00.000Z,,1.0,5.0,This product so far has not disappointed. My c...,brand name
1,electronics brand product name Tablet A 10.1 T...,Target,"Electronics,iPad & Tablets,All Tablets,XYZ bra...",2021-01-13T00:00:00.000Z,,1.0,5.0,great for beginner or experienced person. Boug...,very fast
2,electronics brand product name Tablet A 10.1 T...,Target,"Electronics,iPad & Tablets,All Tablets,XYZ bra...",2021-01-13T00:00:00.000Z,,1.0,5.0,Inexpensive tablet for him to use and learn on...,Beginner tablet for our 9 year old son.
3,electronics brand product name Tablet A 10.1 T...,Target,"Electronics,iPad & Tablets,All Tablets,XYZ bra...",2021-01-13T00:00:00.000Z,,1.0,4.0,I've had my XYZ brand HD 8 two weeks now and I...,Good!!!
4,electronics brand product name Tablet A 10.1 T...,Target,"Electronics,iPad & Tablets,All Tablets,XYZ bra...",2021-01-12T00:00:00.000Z,,1.0,5.0,I bought this for my grand daughter when she c...,Fantastic Tablet for kids
...,...,...,...,...,...,...,...,...,...
34655,,Target,"Computers/Tablets & Networking,Tablet & eBook ...",2012-09-18T00:00:00Z,,,3.0,This is not appreciably faster than any other ...,Not appreciably faster than any other 1.8A cha...
34656,,Target,"Computers/Tablets & Networking,Tablet & eBook ...",2012-11-21T00:00:00Z,,,1.0,Target should include this charger with the br...,Should be included
34657,,Target,"Computers/Tablets & Networking,Tablet & eBook ...",2012-10-19T00:00:00Z,,,1.0,Love my brand name XYZ brand but I am really d...,Disappointing Charger
34658,,Target,"Computers/Tablets & Networking,Tablet & eBook ...",2012-10-31T00:00:00Z,,,1.0,I was surprised to find it did not come with a...,Not worth the money


# Exploratory Data Analysis(EDA)

**Shape of dataset**

In [65]:
data.shape

(34660, 9)

**Summary of data**

In [66]:
data.describe(include = 'all')

Unnamed: 0,product,source,categories,date,didPurchase,doRecommend,rating,reviews,title
count,27900,34660,34660,34621,1.0,34066.0,34627.0,34658,34654
unique,60,6,44,1078,,,,34658,19686
top,"XYZ brand Tablet, 7 Display, Wi-Fi, 8 GB - Inc...",Target,"XYZ brand Tablets,Tablets,Computers & Tablets,...",2021-01-16T00:00:00.000Z,,,,This product so far has not disappointed. My c...,Great product
freq,10966,28701,10966,710,,,,1,645
mean,,,,,1.0,0.959373,4.584573,,
std,,,,,,0.197427,0.735653,,
min,,,,,1.0,0.0,1.0,,
25%,,,,,1.0,1.0,4.0,,
50%,,,,,1.0,1.0,5.0,,
75%,,,,,1.0,1.0,5.0,,


**Type of data present in datset**

In [67]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 34660 entries, 0 to 34659
Data columns (total 9 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   product      27900 non-null  object 
 1   source       34660 non-null  object 
 2   categories   34660 non-null  object 
 3   date         34621 non-null  object 
 4   didPurchase  1 non-null      float64
 5   doRecommend  34066 non-null  float64
 6   rating       34627 non-null  float64
 7   reviews      34658 non-null  object 
 8   title        34654 non-null  object 
dtypes: float64(3), object(6)
memory usage: 2.4+ MB


**Null Values**

In [68]:
data.isnull().sum()*100/len(data)

product        19.503751
source          0.000000
categories      0.000000
date            0.112522
didPurchase    99.997115
doRecommend     1.713791
rating          0.095211
reviews         0.005770
title           0.017311
dtype: float64

*didPurchase feature can be removed as 99% of values are null, and date feature can also be removed*



**Duplicates**

In [69]:
data.duplicated().sum()

0

**Unique Values**

In [70]:
data.nunique()

product           60
source             6
categories        44
date            1078
didPurchase        1
doRecommend        2
rating             5
reviews        34658
title          19686
dtype: int64

**Value Counts**

In [71]:
cols = data.columns.to_list()
for i in cols:
  print("**************************")
  print(data[i].value_counts())

**************************
product
XYZ brand Tablet, 7 Display, Wi-Fi, 8 GB - Includes Special Offers, Magenta                                                                                                                                                        10966
retail brand brand name Paperwhite - eBook reader - 4 GB - 6 monochrome Paperwhite - touchscreen - Wi-Fi - black,,,                                                                                                                 3176
electronics brand product name Tablet A 10.1 Tablet, 8 HD Display, Wi-Fi, 16 GB - Includes Special Offers, Magenta                                                                                                                  2814
retail brand XYZ brand Tv,,,_x000D_\nretail brand XYZ brand Tv,,,                                                                                                                                                                   2526
electonics brand Home_x000D_\nele

In [72]:
# Removing date and didPurchase feature
data.drop(['didPurchase', 'date'], axis = 1, inplace = True)

In [73]:
impure_data = data.copy()

In [74]:
data.dropna(axis = 0, how = 'any', inplace = True)

In [75]:
data.reset_index(drop = True, inplace = True)

In [None]:
#Cleaned data shape
data.shape

(27405, 7)

In [77]:
#lets see number of null values
data.isnull().sum()

product        0
source         0
categories     0
doRecommend    0
rating         0
reviews        0
title          0
dtype: int64

# Text preprocessing
To prepare textual data for machine learning, we apply a series of preprocessing steps. These transformations help standardize the data and reduce noise, improving the performance and accuracy of downstream models. The key preprocessing steps are:

1. **Lowercasing: **
Converting all text to lowercase ensures uniformity, preventing the model from treating “Good” and “good” as separate entities.

2. **Removing Punctuation :**
Punctuation typically does not contribute meaningful information for sentiment analysis. Removing it avoids splitting words incorrectly and reduces vocabulary size.

3. **Removing Stopwords:**
Stopwords are common words (e.g., “is,” “the,” “and,” “have”) that carry minimal semantic value. Removing them helps reduce dimensionality and focus on meaningful content.

4. **Stemming**:
Stemming reduces words to their base or root form by removing suffixes (e.g., “loved” → “love,” “playing” → “play”). It may produce non-dictionary forms and is a fast, rule-based technique.

5. **Lemmatization**:
Lemmatization maps words to their base dictionary form (lemma), considering the word’s context and part of speech (e.g., “better” → “good,” “running” → “run”). It is more accurate than stemming but computationally heavier.



In [78]:
def preprocessing_text(text):
  #convert all to lowercase
    text = text.lower()
  #remove puntuations
    text = text.translate(text.maketrans('', '', string.punctuation))
  #remove stopword
    stop_word = set(stopwords.words('english'))
    text_tokens = word_tokenize(text)
    filtered_words = [word for word in text_tokens if word not in stop_word]
  #stemming
    ps = PorterStemmer()
    Stemmed_words = [ps.stem(w) for w in filtered_words]

  #lemmatizing
    lemmatizer = WordNetLemmatizer()
    lemma_words = [lemmatizer.lemmatize(w, pos = 'a') for w in Stemmed_words]
    return " ".join(lemma_words)

In [79]:
data['doRecommend'].value_counts()

doRecommend
1.0    26255
0.0     1150
Name: count, dtype: int64

In [80]:
data['reviews_cleaned'] = data['reviews'].astype(str).apply(preprocessing_text)
data['title_cleaned'] = data['title'].astype(str).apply(preprocessing_text)

# Predicting probability of doRecommend features to be 1 using reviews and rating feature

Based on the assumption that a guest recommends a product only when they post a positive review, the model is built to predict the likelihood of a review being positive.

Using TF-IDF technique for converting text features to numerical(unigrams, bigrams, trigrams)

In [81]:
X_review = data.loc[:, ['reviews_cleaned', 'rating']]
y_review = data.loc[:, ['doRecommend']]

X_title = data.loc[:, ['title_cleaned', 'rating']]
y_title = data.loc[:, ['doRecommend']]

In [None]:
tfidf = TfidfVectorizer(max_features=100, ngram_range=(1,3))

# Reviews vectorization
review_tfidf_matrix = tfidf.fit_transform(data["reviews_cleaned"])
review_tfidf_df = pd.DataFrame(review_tfidf_matrix.toarray(), columns=tfidf.get_feature_names_out())
X_review = pd.concat([data[['rating']].reset_index(drop=True), review_tfidf_df], axis=1)

# Titles vectorization
title_tfidf_matrix = tfidf.transform(data["title_cleaned"])
title_tfidf_df = pd.DataFrame(title_tfidf_matrix.toarray(), columns=tfidf.get_feature_names_out())
X_title = pd.concat([data[['rating']].reset_index(drop=True), title_tfidf_df], axis=1)

# Train-Test Split 
X_train_review, X_test_review, y_train_review, y_test_review = train_test_split(
    X_review, y_review, test_size=0.2, stratify=y_review, random_state=42
)


In [83]:
X_review

Unnamed: 0,rating,abl,alexa,also,app,ask,batteri,best,book,bought,...,want,watch,well,work,work great,would,xyz,xyz brand,year,year old
0,5.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.0,0.000000,...,0.0,0.000000,0.000000,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0
1,5.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.0,0.514421,...,0.0,0.000000,0.000000,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0
2,5.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.0,0.000000,...,0.0,0.000000,0.000000,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0
3,4.0,0.198316,0.000000,0.000000,0.294394,0.000000,0.0,0.000000,0.0,0.000000,...,0.0,0.197399,0.168256,0.0,0.0,0.0,0.320098,0.321616,0.0,0.0
4,5.0,0.000000,0.000000,0.243839,0.000000,0.000000,0.0,0.239830,0.0,0.360045,...,0.0,0.000000,0.000000,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
27400,5.0,0.156383,0.274140,0.000000,0.000000,0.000000,0.0,0.273428,0.0,0.000000,...,0.0,0.000000,0.000000,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0
27401,5.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.0,0.000000,...,0.0,0.000000,0.298681,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0
27402,4.0,0.000000,0.235838,0.000000,0.000000,0.273097,0.0,0.000000,0.0,0.000000,...,0.0,0.000000,0.000000,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0
27403,3.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,0.0,0.000000,...,0.0,0.000000,0.000000,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0


In [84]:
y_review

Unnamed: 0,doRecommend
0,1.0
1,1.0
2,1.0
3,1.0
4,1.0
...,...
27400,1.0
27401,1.0
27402,1.0
27403,0.0


['rating',
 'abl',
 'alexa',
 'also',
 'app',
 'ask',
 'batteri',
 'best',
 'book',
 'bought',
 'brand',
 'brand name',
 'brand name xyz',
 'buy',
 'cant',
 'christma',
 'control',
 'could',
 'daughter',
 'day',
 'devic',
 'dont',
 'download',
 'easi',
 'easi use',
 'echo',
 'enjoy',
 'even',
 'everyth',
 'featur',
 'first',
 'fun',
 'game',
 'get',
 'gift',
 'go',
 'good',
 'got',
 'great',
 'great tablet',
 'happi',
 'home',
 'im',
 'ipad',
 'kid',
 'learn',
 'life',
 'light',
 'like',
 'littl',
 'long',
 'look',
 'lot',
 'love',
 'make',
 'mani',
 'movi',
 'much',
 'music',
 'name',
 'name xyz',
 'name xyz brand',
 'need',
 'new',
 'nice',
 'old',
 'one',
 'paperwhit',
 'perfect',
 'play',
 'price',
 'product',
 'purchas',
 'qualiti',
 'read',
 'reader',
 'realli',
 'recommend',
 'screen',
 'set',
 'size',
 'small',
 'son',
 'sound',
 'still',
 'tablet',
 'take',
 'target',
 'thing',
 'time',
 'use',
 'want',
 'watch',
 'well',
 'work',
 'work great',
 'would',
 'xyz',
 'xyz brand',
 'year',
 'year old']

In [86]:
X_train_review.reset_index(drop = True, inplace = True)
y_train_review.reset_index(drop = True, inplace = True)

## Random Forest

In [None]:
skf = StratifiedKFold(n_splits=10, shuffle=True, random_state=42)

test_fold_preds = []
review_fold_preds = []
title_fold_preds = []

i = 1
for train_index, test_index in skf.split(X_train_review, y_train_review):
    print(f"{i} Fold of {skf.n_splits} StratifiedKFold")
    xtr, xvl = X_train_review.loc[train_index], X_train_review.loc[test_index]
    ytr, yvl = y_train_review.loc[train_index], y_train_review.loc[test_index]

    rf_model_review = RandomForestClassifier(n_jobs=-1, random_state=42, class_weight='balanced')
    rf_model_review.fit(xtr, ytr)

    # Predictions
    tr_pred = rf_model_review.predict(xtr)  # Training set predictions
    vl_pred = rf_model_review.predict(xvl)  # Validation set predictions
    test_preds = rf_model_review.predict(X_test_review)  # Predicting on Test set predictions
    test_prob = rf_model_review.predict_proba(X_test_review)[:, 1]  
    review_preds = rf_model_review.predict(X_review)  
    review_prob = rf_model_review.predict_proba(X_review)[:, 1]  
    title_preds = rf_model_review.predict(X_title)  
    title_prob = rf_model_review.predict_proba(X_title)[:, 1]  

    # Print accuracy and other metrics for training, validation, and test
    print("----------------Training------------------")
    print(f"accuracy score: {round(accuracy_score(ytr, tr_pred) * 100, 2)}%")
    print(f"Confusion Matrix: {confusion_matrix(ytr, tr_pred)}")
    print(f"Classification Report: {classification_report(ytr, tr_pred)}")

    print("----------------Validation------------------")
    print(f"accuracy score: {round(accuracy_score(yvl, vl_pred) * 100, 2)}%")
    print(f"Confusion Matrix: {confusion_matrix(yvl, vl_pred)}")
    print(f"Classification Report: {classification_report(yvl, vl_pred)}")

    print("----------------Testing------------------")
    print(f"accuracy score: {round(accuracy_score(y_test_review, test_preds) * 100, 2)}%")
    print(f"Confusion Matrix: {confusion_matrix(y_test_review, test_preds)}")
    print(f"Classification Report: {classification_report(y_test_review, test_preds)}")

    print("----------------Review Predictions------------------")
    print(f"accuracy score: {round(accuracy_score(y_review, review_preds) * 100, 2)}%")
    print(f"Confusion Matrix: {confusion_matrix(y_review, review_preds)}")
    print(f"Classification Report: {classification_report(y_review, review_preds)}")

    print("----------------Title Predictions------------------")
    print(f"accuracy score: {round(accuracy_score(y_title, title_preds) * 100, 2)}%")
    print(f"Confusion Matrix: {confusion_matrix(y_title, title_preds)}")
    print(f"Classification Report: {classification_report(y_title, title_preds)}")

    # Store predictions for each fold
    test_fold_preds.append(test_preds)
    review_fold_preds.append(review_prob)
    title_fold_preds.append(title_prob)
    i += 1

1 Fold of 10 StratifiedKFold


  return fit_method(estimator, *args, **kwargs)


----------------Training------------------
accuracy score: 99.9%
Confusion Matrix: [[  827     1]
 [   19 18884]]
Classification Report:               precision    recall  f1-score   support

         0.0       0.98      1.00      0.99       828
         1.0       1.00      1.00      1.00     18903

    accuracy                           1.00     19731
   macro avg       0.99      1.00      0.99     19731
weighted avg       1.00      1.00      1.00     19731

----------------Validation------------------
accuracy score: 97.13%
Confusion Matrix: [[  46   46]
 [  17 2084]]
Classification Report:               precision    recall  f1-score   support

         0.0       0.73      0.50      0.59        92
         1.0       0.98      0.99      0.99      2101

    accuracy                           0.97      2193
   macro avg       0.85      0.75      0.79      2193
weighted avg       0.97      0.97      0.97      2193

----------------Testing------------------
accuracy score: 97.45%
Confusio

  return fit_method(estimator, *args, **kwargs)


----------------Training------------------
accuracy score: 99.87%
Confusion Matrix: [[  827     1]
 [   24 18879]]
Classification Report:               precision    recall  f1-score   support

         0.0       0.97      1.00      0.99       828
         1.0       1.00      1.00      1.00     18903

    accuracy                           1.00     19731
   macro avg       0.99      1.00      0.99     19731
weighted avg       1.00      1.00      1.00     19731

----------------Validation------------------
accuracy score: 97.31%
Confusion Matrix: [[  47   45]
 [  14 2087]]
Classification Report:               precision    recall  f1-score   support

         0.0       0.77      0.51      0.61        92
         1.0       0.98      0.99      0.99      2101

    accuracy                           0.97      2193
   macro avg       0.87      0.75      0.80      2193
weighted avg       0.97      0.97      0.97      2193

----------------Testing------------------
accuracy score: 97.52%
Confusi

  return fit_method(estimator, *args, **kwargs)


----------------Training------------------
accuracy score: 99.87%
Confusion Matrix: [[  827     1]
 [   24 18879]]
Classification Report:               precision    recall  f1-score   support

         0.0       0.97      1.00      0.99       828
         1.0       1.00      1.00      1.00     18903

    accuracy                           1.00     19731
   macro avg       0.99      1.00      0.99     19731
weighted avg       1.00      1.00      1.00     19731

----------------Validation------------------
accuracy score: 97.81%
Confusion Matrix: [[  61   31]
 [  17 2084]]
Classification Report:               precision    recall  f1-score   support

         0.0       0.78      0.66      0.72        92
         1.0       0.99      0.99      0.99      2101

    accuracy                           0.98      2193
   macro avg       0.88      0.83      0.85      2193
weighted avg       0.98      0.98      0.98      2193

----------------Testing------------------
accuracy score: 97.34%
Confusi

  return fit_method(estimator, *args, **kwargs)


----------------Training------------------
accuracy score: 99.87%
Confusion Matrix: [[  827     1]
 [   25 18878]]
Classification Report:               precision    recall  f1-score   support

         0.0       0.97      1.00      0.98       828
         1.0       1.00      1.00      1.00     18903

    accuracy                           1.00     19731
   macro avg       0.99      1.00      0.99     19731
weighted avg       1.00      1.00      1.00     19731

----------------Validation------------------
accuracy score: 97.22%
Confusion Matrix: [[  56   36]
 [  25 2076]]
Classification Report:               precision    recall  f1-score   support

         0.0       0.69      0.61      0.65        92
         1.0       0.98      0.99      0.99      2101

    accuracy                           0.97      2193
   macro avg       0.84      0.80      0.82      2193
weighted avg       0.97      0.97      0.97      2193

----------------Testing------------------
accuracy score: 97.39%
Confusi

  return fit_method(estimator, *args, **kwargs)


----------------Training------------------
accuracy score: 99.87%
Confusion Matrix: [[  827     1]
 [   24 18880]]
Classification Report:               precision    recall  f1-score   support

         0.0       0.97      1.00      0.99       828
         1.0       1.00      1.00      1.00     18904

    accuracy                           1.00     19732
   macro avg       0.99      1.00      0.99     19732
weighted avg       1.00      1.00      1.00     19732

----------------Validation------------------
accuracy score: 97.13%
Confusion Matrix: [[  49   43]
 [  20 2080]]
Classification Report:               precision    recall  f1-score   support

         0.0       0.71      0.53      0.61        92
         1.0       0.98      0.99      0.99      2100

    accuracy                           0.97      2192
   macro avg       0.84      0.76      0.80      2192
weighted avg       0.97      0.97      0.97      2192

----------------Testing------------------
accuracy score: 97.56%
Confusi

  return fit_method(estimator, *args, **kwargs)


----------------Training------------------
accuracy score: 99.9%
Confusion Matrix: [[  828     0]
 [   20 18884]]
Classification Report:               precision    recall  f1-score   support

         0.0       0.98      1.00      0.99       828
         1.0       1.00      1.00      1.00     18904

    accuracy                           1.00     19732
   macro avg       0.99      1.00      0.99     19732
weighted avg       1.00      1.00      1.00     19732

----------------Validation------------------
accuracy score: 97.26%
Confusion Matrix: [[  53   39]
 [  21 2079]]
Classification Report:               precision    recall  f1-score   support

         0.0       0.72      0.58      0.64        92
         1.0       0.98      0.99      0.99      2100

    accuracy                           0.97      2192
   macro avg       0.85      0.78      0.81      2192
weighted avg       0.97      0.97      0.97      2192

----------------Testing------------------
accuracy score: 97.5%
Confusion

  return fit_method(estimator, *args, **kwargs)


----------------Training------------------
accuracy score: 99.89%
Confusion Matrix: [[  826     2]
 [   19 18885]]
Classification Report:               precision    recall  f1-score   support

         0.0       0.98      1.00      0.99       828
         1.0       1.00      1.00      1.00     18904

    accuracy                           1.00     19732
   macro avg       0.99      1.00      0.99     19732
weighted avg       1.00      1.00      1.00     19732

----------------Validation------------------
accuracy score: 97.45%
Confusion Matrix: [[  47   45]
 [  11 2089]]
Classification Report:               precision    recall  f1-score   support

         0.0       0.81      0.51      0.63        92
         1.0       0.98      0.99      0.99      2100

    accuracy                           0.97      2192
   macro avg       0.89      0.75      0.81      2192
weighted avg       0.97      0.97      0.97      2192

----------------Testing------------------
accuracy score: 97.56%
Confusi

  return fit_method(estimator, *args, **kwargs)


----------------Training------------------
accuracy score: 99.89%
Confusion Matrix: [[  827     1]
 [   20 18884]]
Classification Report:               precision    recall  f1-score   support

         0.0       0.98      1.00      0.99       828
         1.0       1.00      1.00      1.00     18904

    accuracy                           1.00     19732
   macro avg       0.99      1.00      0.99     19732
weighted avg       1.00      1.00      1.00     19732

----------------Validation------------------
accuracy score: 97.58%
Confusion Matrix: [[  52   40]
 [  13 2087]]
Classification Report:               precision    recall  f1-score   support

         0.0       0.80      0.57      0.66        92
         1.0       0.98      0.99      0.99      2100

    accuracy                           0.98      2192
   macro avg       0.89      0.78      0.82      2192
weighted avg       0.97      0.98      0.97      2192

----------------Testing------------------
accuracy score: 97.26%
Confusi

  return fit_method(estimator, *args, **kwargs)


----------------Training------------------
accuracy score: 99.89%
Confusion Matrix: [[  826     2]
 [   20 18884]]
Classification Report:               precision    recall  f1-score   support

         0.0       0.98      1.00      0.99       828
         1.0       1.00      1.00      1.00     18904

    accuracy                           1.00     19732
   macro avg       0.99      1.00      0.99     19732
weighted avg       1.00      1.00      1.00     19732

----------------Validation------------------
accuracy score: 97.67%
Confusion Matrix: [[  54   38]
 [  13 2087]]
Classification Report:               precision    recall  f1-score   support

         0.0       0.81      0.59      0.68        92
         1.0       0.98      0.99      0.99      2100

    accuracy                           0.98      2192
   macro avg       0.89      0.79      0.83      2192
weighted avg       0.97      0.98      0.97      2192

----------------Testing------------------
accuracy score: 97.54%
Confusi

  return fit_method(estimator, *args, **kwargs)


----------------Training------------------
accuracy score: 99.91%
Confusion Matrix: [[  827     1]
 [   16 18888]]
Classification Report:               precision    recall  f1-score   support

         0.0       0.98      1.00      0.99       828
         1.0       1.00      1.00      1.00     18904

    accuracy                           1.00     19732
   macro avg       0.99      1.00      0.99     19732
weighted avg       1.00      1.00      1.00     19732

----------------Validation------------------
accuracy score: 97.13%
Confusion Matrix: [[  55   37]
 [  26 2074]]
Classification Report:               precision    recall  f1-score   support

         0.0       0.68      0.60      0.64        92
         1.0       0.98      0.99      0.99      2100

    accuracy                           0.97      2192
   macro avg       0.83      0.79      0.81      2192
weighted avg       0.97      0.97      0.97      2192

----------------Testing------------------
accuracy score: 97.39%
Confusi

In [88]:
def mode_of_preds(a):
    u, c = np.unique(a, return_counts=True)
    return u[c.argmax()]

Test Predictions value Counts

In [89]:
#Mode of predictions in all folds
final_preds = np.apply_along_axis(mode_of_preds, 0, test_fold_preds)
print(final_preds)

final_preds = pd.DataFrame(np.array(final_preds))

print("Actual")
print(y_test_review.value_counts())

print("Predicted")
print(final_preds.value_counts())

[1. 1. 1. ... 1. 1. 1.]
Actual
doRecommend
1.0            5251
0.0             230
Name: count, dtype: int64
Predicted
0  
1.0    5295
0.0     186
Name: count, dtype: int64


Review predictions value counts by taking mean of all predictions in folds

In [90]:
#Mean of probs in all folds
final_review_prob = np.mean(review_fold_preds, axis = 0)
data['review_prob'] = final_review_prob
final_review_class = (final_review_prob>=0.5).astype('int')

In [91]:
final_review_df = pd.DataFrame(final_review_class)
print("Actual")
print(y_review.value_counts())
print("Predicted")
final_review_df.value_counts()

Actual
doRecommend
1.0            26255
0.0             1150
Name: count, dtype: int64
Predicted


0
1    26282
0     1123
Name: count, dtype: int64

Title predictions value counts

In [92]:
#Mean of probs in all folds
final_title_prob = np.mean(title_fold_preds, axis = 0)
data['title_prob'] = final_title_prob
final_title_class = (final_title_prob>=0.5).astype('int')

In [93]:
final_title_df = pd.DataFrame(final_title_class)
print("Actual")
print(y_title.value_counts())
print("Predicted")
final_title_df.value_counts()

Actual
doRecommend
1.0            26255
0.0             1150
Name: count, dtype: int64
Predicted


0
1    25977
0     1428
Name: count, dtype: int64

# Create Score for every product for ranking

In [94]:
data['score'] = data['doRecommend'] + data['rating'] + data['review_prob'] + data['title_prob']

In [95]:
data

Unnamed: 0,product,source,categories,doRecommend,rating,reviews,title,reviews_cleaned,title_cleaned,review_prob,title_prob,score
0,electronics brand product name Tablet A 10.1 T...,Target,"Electronics,iPad & Tablets,All Tablets,XYZ bra...",1.0,5.0,This product so far has not disappointed. My c...,brand name,product far disappoint children love use like ...,brand name,1.000000,1.000000,8.000000
1,electronics brand product name Tablet A 10.1 T...,Target,"Electronics,iPad & Tablets,All Tablets,XYZ bra...",1.0,5.0,great for beginner or experienced person. Boug...,very fast,great beginn experienc person bought gift love,fast,1.000000,1.000000,8.000000
2,electronics brand product name Tablet A 10.1 T...,Target,"Electronics,iPad & Tablets,All Tablets,XYZ bra...",1.0,5.0,Inexpensive tablet for him to use and learn on...,Beginner tablet for our 9 year old son.,inexpens tablet use learn step one thrill lear...,beginn tablet 9 year old son,0.994000,0.998000,7.992000
3,electronics brand product name Tablet A 10.1 T...,Target,"Electronics,iPad & Tablets,All Tablets,XYZ bra...",1.0,4.0,I've had my XYZ brand HD 8 two weeks now and I...,Good!!!,ive xyz brand hd 8 two week love tablet great ...,good,0.998000,1.000000,6.998000
4,electronics brand product name Tablet A 10.1 T...,Target,"Electronics,iPad & Tablets,All Tablets,XYZ bra...",1.0,5.0,I bought this for my grand daughter when she c...,Fantastic Tablet for kids,bought grand daughter come visit set user ente...,fantast tablet kid,0.999000,0.999000,7.998000
...,...,...,...,...,...,...,...,...,...,...,...,...
27400,New retail brand brand name XYZ brand Hd 9w Po...,Target,"Stereos,Remote Controls,Target Echo,Audio Dock...",1.0,5.0,This is my new favorite device. While not perf...,My new favorite product,new favorit devic perfect lot un useon featur ...,new favorit product,0.997000,1.000000,7.997000
27401,New retail brand brand name XYZ brand Hd 9w Po...,Target,"Stereos,Remote Controls,Target Echo,Audio Dock...",1.0,5.0,I got this to basically experiment with. Strai...,Lots of potential!!!,got basic experi straight box realli impress n...,lot potenti,0.999000,1.000000,7.999000
27402,New retail brand brand name XYZ brand Hd 9w Po...,Target,"Stereos,Remote Controls,Target Echo,Audio Dock...",1.0,4.0,Good product that does the basics. Too bad you...,Good First Generation Product,good product basic bad buy specif light contro...,good first gener product,0.997189,0.995872,6.993061
27403,New retail brand brand name XYZ brand Hd 9w Po...,Target,"Stereos,Remote Controls,Target Echo,Audio Dock...",0.0,3.0,This is great for a connected home. People who...,"Great for a ""connected home""",great connect home peopl use buy plan make eve...,great connect home,0.226147,0.572523,3.798670


In [96]:
print(data.loc[27404,['reviews']].values)

['Cool product. Target does a cool job with it. Great audio quality and like the Philips Hue integration.']


In [97]:
useful_data = data[['categories', 'product', 'score']]
useful_data

Unnamed: 0,categories,product,score
0,"Electronics,iPad & Tablets,All Tablets,XYZ bra...",electronics brand product name Tablet A 10.1 T...,8.000000
1,"Electronics,iPad & Tablets,All Tablets,XYZ bra...",electronics brand product name Tablet A 10.1 T...,8.000000
2,"Electronics,iPad & Tablets,All Tablets,XYZ bra...",electronics brand product name Tablet A 10.1 T...,7.992000
3,"Electronics,iPad & Tablets,All Tablets,XYZ bra...",electronics brand product name Tablet A 10.1 T...,6.998000
4,"Electronics,iPad & Tablets,All Tablets,XYZ bra...",electronics brand product name Tablet A 10.1 T...,7.998000
...,...,...,...
27400,"Stereos,Remote Controls,Target Echo,Audio Dock...",New retail brand brand name XYZ brand Hd 9w Po...,7.997000
27401,"Stereos,Remote Controls,Target Echo,Audio Dock...",New retail brand brand name XYZ brand Hd 9w Po...,7.999000
27402,"Stereos,Remote Controls,Target Echo,Audio Dock...",New retail brand brand name XYZ brand Hd 9w Po...,6.993061
27403,"Stereos,Remote Controls,Target Echo,Audio Dock...",New retail brand brand name XYZ brand Hd 9w Po...,3.798670


In [98]:
df = useful_data.groupby(['categories', 'product'])['score'].mean()
d = df.reset_index().sort_values(['categories','score'], ascending = False).set_index(['categories', 'product'])
d #It contains all the products ranked best to least per category

Unnamed: 0_level_0,Unnamed: 1_level_0,score
categories,product,Unnamed: 2_level_1
"eBook Readers,brand name E-readers,Computers & Tablets,E-Readers & Accessories,E-Readers","brand name Oasis E-reader with Leather Charging Cover - Merlot, 6 High-Resolution Display (300 ppi), Wi-Fi - Includes Special Offers,,",7.431744
"brand name E-readers,Electronics Features,Computers & Tablets,E-Readers & Accessories,E-Readers,eBook Readers","Brand New electronics brand IPad16gb 7 Ips Display Tablet Wifi 16 Gb Blue,,,",7.396717
"XYZ brand Tablets,Tablets,Computers & Tablets,All Tablets,Electronics, Tech Toys, Movies, Music,Electronics,iPad & Tablets,Android Tablets,Frys","retail brand - brand name Voyage - 4GB - Wi-Fi + 3G - Black,,,\nretail brand - brand name Voyage - 4GB - Wi-Fi + 3G - Black,,,",8.000000
"XYZ brand Tablets,Tablets,Computers & Tablets,All Tablets,Electronics, Tech Toys, Movies, Music,Electronics,iPad & Tablets,Android Tablets,Frys","Certified Refurbished electronics brand TV (Previous Generation - 1st),,,\nCertified Refurbished electronics brand TV (Previous Generation - 1st),,,",7.993109
"XYZ brand Tablets,Tablets,Computers & Tablets,All Tablets,Electronics, Tech Toys, Movies, Music,Electronics,iPad & Tablets,Android Tablets,Frys","XYZ brand HD 8 Tablet with Alexa, 8 HD Display, 16 GB, Tangerine - with Special Offers,",7.652686
...,...,...
"Computers/Tablets & Networking,Tablets & eBook Readers,Computers & Tablets,Tablets,All Tablets","retail brand brand name Touch Leather Case (4th Generation - 2011 Release), Olive Green,,,_x000D_\nretail brand brand name Touch Leather Case (4th Generation - 2011 Release), Olive Green,,,",7.987667
"Computers/Tablets & Networking,Tablets & eBook Readers,Computers & Tablets,Tablets,All Tablets","Brand New electronics brand IPad16gb 7 Ips Display Tablet Wifi 16 Gb Blue,,,",7.359081
"Computers/Tablets & Networking,Tablets & eBook Readers,Computers & Tablets,Tablets,All Tablets","XYZ brand Kids Edition Tablet, 7 Display, Wi-Fi, 16 GB, Green Kid-Proof Case",7.245684
"Computers & Tablets,Tablets,All Tablets,Computers/Tablets & Networking,Tablets & eBook Readers,XYZ brand Tablets,Frys",\nelectonics brand Home,7.695760


In [99]:
dcp = d.copy()
dcp.reset_index(inplace = True)
dcp

Unnamed: 0,categories,product,score
0,"eBook Readers,brand name E-readers,Computers &...",brand name Oasis E-reader with Leather Chargin...,7.431744
1,"brand name E-readers,Electronics Features,Comp...",Brand New electronics brand IPad16gb 7 Ips Dis...,7.396717
2,"XYZ brand Tablets,Tablets,Computers & Tablets,...",retail brand - brand name Voyage - 4GB - Wi-Fi...,8.000000
3,"XYZ brand Tablets,Tablets,Computers & Tablets,...",Certified Refurbished electronics brand TV (Pr...,7.993109
4,"XYZ brand Tablets,Tablets,Computers & Tablets,...","XYZ brand HD 8 Tablet with Alexa, 8 HD Display...",7.652686
...,...,...,...
67,"Computers/Tablets & Networking,Tablets & eBook...",retail brand brand name Touch Leather Case (4t...,7.987667
68,"Computers/Tablets & Networking,Tablets & eBook...",Brand New electronics brand IPad16gb 7 Ips Dis...,7.359081
69,"Computers/Tablets & Networking,Tablets & eBook...","XYZ brand Kids Edition Tablet, 7 Display, Wi-F...",7.245684
70,"Computers & Tablets,Tablets,All Tablets,Comput...",\nelectonics brand Home,7.695760


In [100]:
best = d.reset_index().groupby(['categories']).nth(0) #Obtaining the best selling product per category
least = d.reset_index().groupby(['categories']).nth(-1) #Obtaining the least selling product per category

In [101]:
best

Unnamed: 0,categories,product,score
0,"eBook Readers,brand name E-readers,Computers &...",brand name Oasis E-reader with Leather Chargin...,7.431744
1,"brand name E-readers,Electronics Features,Comp...",Brand New electronics brand IPad16gb 7 Ips Dis...,7.396717
2,"XYZ brand Tablets,Tablets,Computers & Tablets,...",retail brand - brand name Voyage - 4GB - Wi-Fi...,8.0
12,"XYZ brand Tablets,Tablets,Computers & Tablets,...",electonics brand Home,7.996
16,"XYZ brand Tablets,Tablets,Computers & Tablets,...",Brand New electronics brand IPad16gb 7 Ips Dis...,7.992606
17,"Walmart for Business,Office Electronics,Tablet...","brand name Voyage E-reader, 6 High-Resolution ...",7.719881
19,"Walmart for Business,Office Electronics,Tablet...",retail brand XYZ brand Hd 8 8in Tablet 16gb Bl...,7.999
27,"Walmart for Business,Office Electronics,Tablet...",Certified Refurbished electronics brand TV Sti...,8.0
33,"Tablets,XYZ brand Tablets,Electronics,Computer...","XYZ brand HD 8 Tablet with Alexa, 8 HD Display...",7.830405
35,"Tablets,XYZ brand Tablets,Computers & Tablets,...",retail brand 5W USB Official OEM Charger and P...,8.0


In [102]:
best = best.reset_index()

In [103]:
least

Unnamed: 0,categories,product,score
0,"eBook Readers,brand name E-readers,Computers &...",brand name Oasis E-reader with Leather Chargin...,7.431744
1,"brand name E-readers,Electronics Features,Comp...",Brand New electronics brand IPad16gb 7 Ips Dis...,7.396717
11,"XYZ brand Tablets,Tablets,Computers & Tablets,...",retail brand - brand name Voyage - 4GB - Wi-Fi...,3.327403
15,"XYZ brand Tablets,Tablets,Computers & Tablets,...",electonics brand Home\n,6.620801
16,"XYZ brand Tablets,Tablets,Computers & Tablets,...",Brand New electronics brand IPad16gb 7 Ips Dis...,7.992606
18,"Walmart for Business,Office Electronics,Tablet...",retail brand brand name Paperwhite - eBook rea...,7.704
26,"Walmart for Business,Office Electronics,Tablet...",retail brand Echo and XYZ brand TV Power Adapt...,6.976
32,"Walmart for Business,Office Electronics,Tablet...","brand name Paperwhite E-reader - White, 6 High...",7.426122
34,"Tablets,XYZ brand Tablets,Electronics,Computer...",electronics brand product name Tablet A 10.1 T...,7.412922
39,"Tablets,XYZ brand Tablets,Computers & Tablets,...",retail brand 5W USB Official OEM Charger and P...,6.91451


As you can see, there are products which have a good score but present in the least selling products data. Let's remove them based on score criteria.

Remove if *Score >= 5*

In [104]:
least = least.reset_index()
least_criteria_0 = least[least['score'] < 5]

In [105]:
least_criteria_0

Unnamed: 0,index,categories,product,score
2,11,"XYZ brand Tablets,Tablets,Computers & Tablets,...",retail brand - brand name Voyage - 4GB - Wi-Fi...,3.327403


Surprisingly, the dataset contains only a single category and a single product. Therefore, we will proceed with the analysis as follows:

* Retrieve the lowest-rated reviews and ratings for all products present in the "least-performing products" DataFrame.

* Apply a filtering criterion based on product ratings to narrow down genuinely underperforming products.

* Specifically, we will drop all rows (products) with a rating less than or equal to 3, as these represent products with poor customer reception.

In [106]:
least_criteria_1 = least

In [107]:
least_criteria_1 = least_criteria_1.reset_index(drop = True)

In [108]:
def retrieve_review_rating(category, product, df, flag):
  df = df.copy()
  df = df[(df['categories'] == category)&(df['product'] == product)][['reviews', 'score', 'rating', 'doRecommend', 'review_prob', 'title_prob']]
  df = df.reset_index(drop=True).sort_values(by=['score', 'rating', 'doRecommend', 'review_prob', 'title_prob'], ascending = True).reset_index(drop = True)
  if len(df) == 0:
    return [None, None]
  elif flag == 1:
    return [df['reviews'][0], df['rating'][0]] # retrieve bad review along with poor rating
  else:
    return [df['reviews'][len(df)-1], df['rating'][len(df)-1]]# retrieve good review along with good rating

In [109]:
for i in range(len(least_criteria_1)):
  li = retrieve_review_rating(least_criteria_1.loc[i, 'categories'],least_criteria_1.loc[i, 'product'], data, 1)
  least_criteria_1.loc[i, 'review'], least_criteria_1.loc[i, 'rating'] = li[0],li[1]
least_criteria_1

Unnamed: 0,index,categories,product,score,review,rating
0,0,"eBook Readers,brand name E-readers,Computers &...",brand name Oasis E-reader with Leather Chargin...,7.431744,This is not an upgrade by any means! My three ...,1.0
1,1,"brand name E-readers,Electronics Features,Comp...",Brand New electronics brand IPad16gb 7 Ips Dis...,7.396717,This was a gift for my friend. My friend likes...,5.0
2,11,"XYZ brand Tablets,Tablets,Computers & Tablets,...",retail brand - brand name Voyage - 4GB - Wi-Fi...,3.327403,Its ok for the price if willing to deal with t...,3.0
3,15,"XYZ brand Tablets,Tablets,Computers & Tablets,...",electonics brand Home\n,6.620801,"Full disclosure, I've only had iPads in the pa...",2.0
4,16,"XYZ brand Tablets,Tablets,Computers & Tablets,...",Brand New electronics brand IPad16gb 7 Ips Dis...,7.992606,"Bigger screen, longer battery life and faster ...",5.0
5,18,"Walmart for Business,Office Electronics,Tablet...",retail brand brand name Paperwhite - eBook rea...,7.704,I took it back the day after purchasing it. Th...,1.0
6,26,"Walmart for Business,Office Electronics,Tablet...",retail brand Echo and XYZ brand TV Power Adapt...,6.976,This is the first e-reader I've gotten and I L...,4.0
7,32,"Walmart for Business,Office Electronics,Tablet...","brand name Paperwhite E-reader - White, 6 High...",7.426122,My brand name Voyage is nearly 16 months old a...,1.0
8,34,"Tablets,XYZ brand Tablets,Electronics,Computer...",electronics brand product name Tablet A 10.1 T...,7.412922,I bought this because I have ebooks in college...,1.0
9,39,"Tablets,XYZ brand Tablets,Computers & Tablets,...",retail brand 5W USB Official OEM Charger and P...,6.91451,Dont have option for password ask you before b...,1.0


In [110]:
# Filtering rows that have rating <= 3
final_least_selling_prods = least_criteria_1[least_criteria_1['rating'] <= 3].reset_index(drop = True)
final_least_selling_prods

Unnamed: 0,index,categories,product,score,review,rating
0,0,"eBook Readers,brand name E-readers,Computers &...",brand name Oasis E-reader with Leather Chargin...,7.431744,This is not an upgrade by any means! My three ...,1.0
1,11,"XYZ brand Tablets,Tablets,Computers & Tablets,...",retail brand - brand name Voyage - 4GB - Wi-Fi...,3.327403,Its ok for the price if willing to deal with t...,3.0
2,15,"XYZ brand Tablets,Tablets,Computers & Tablets,...",electonics brand Home\n,6.620801,"Full disclosure, I've only had iPads in the pa...",2.0
3,18,"Walmart for Business,Office Electronics,Tablet...",retail brand brand name Paperwhite - eBook rea...,7.704,I took it back the day after purchasing it. Th...,1.0
4,32,"Walmart for Business,Office Electronics,Tablet...","brand name Paperwhite E-reader - White, 6 High...",7.426122,My brand name Voyage is nearly 16 months old a...,1.0
5,34,"Tablets,XYZ brand Tablets,Electronics,Computer...",electronics brand product name Tablet A 10.1 T...,7.412922,I bought this because I have ebooks in college...,1.0
6,39,"Tablets,XYZ brand Tablets,Computers & Tablets,...",retail brand 5W USB Official OEM Charger and P...,6.91451,Dont have option for password ask you before b...,1.0
7,49,"Stereos,Remote Controls,retail brand Echo,Audi...","retail brand XYZ brand Tv,,,_x000D_\nretail br...",7.516128,You have to pay for every thing -any thing you...,1.0
8,60,"Stereos,Remote Controls,Target Echo,Audio Dock...",New retail brand brand name XYZ brand Hd 9w Po...,6.460355,This is great for a connected home. People who...,3.0
9,61,"Electronics,iPad & Tablets,All Tablets,XYZ bra...",electronics brand product name Tablet A 10.1 T...,7.463177,Freeze frequently... No way to trouble shoot o...,1.0


In [111]:
for i in range(len(best)):
  li = retrieve_review_rating(best.loc[i, 'categories'],best.loc[i, 'product'], data, 0)
  best.loc[i, 'review'], best.loc[i, 'rating'] = li[0],li[1]
best

Unnamed: 0,index,categories,product,score,review,rating
0,0,"eBook Readers,brand name E-readers,Computers &...",brand name Oasis E-reader with Leather Chargin...,7.431744,"ve owned the brand name Keyboard, 2nd generati...",5.0
1,1,"brand name E-readers,Electronics Features,Comp...",Brand New electronics brand IPad16gb 7 Ips Dis...,7.396717,Was a gift to replace older one and she really...,5.0
2,2,"XYZ brand Tablets,Tablets,Computers & Tablets,...",retail brand - brand name Voyage - 4GB - Wi-Fi...,8.0,Can't beat a tablet for $50 that actually work...,5.0
3,12,"XYZ brand Tablets,Tablets,Computers & Tablets,...",electonics brand Home,7.996,Although this won't be competing with the iPad...,5.0
4,16,"XYZ brand Tablets,Tablets,Computers & Tablets,...",Brand New electronics brand IPad16gb 7 Ips Dis...,7.992606,Love my brand name! So when mother table died ...,5.0
5,17,"Walmart for Business,Office Electronics,Tablet...","brand name Voyage E-reader, 6 High-Resolution ...",7.719881,The brand name paper white is amazing. I love ...,5.0
6,19,"Walmart for Business,Office Electronics,Tablet...",retail brand XYZ brand Hd 8 8in Tablet 16gb Bl...,7.999,"A bit pricey, but worth the money in my opinio...",5.0
7,27,"Walmart for Business,Office Electronics,Tablet...",Certified Refurbished electronics brand TV Sti...,8.0,Bought this for my cousins and he absolutely l...,5.0
8,33,"Tablets,XYZ brand Tablets,Electronics,Computer...","XYZ brand HD 8 Tablet with Alexa, 8 HD Display...",7.830405,I love new brand name XYZ brand. I love the co...,5.0
9,35,"Tablets,XYZ brand Tablets,Computers & Tablets,...",retail brand 5W USB Official OEM Charger and P...,8.0,Fast fun tablet with great speakers compared t...,5.0


In [None]:
print("Vectorizer features:", len(tfidf.get_feature_names_out()))

Vectorizer features: 100


In [None]:
import pickle
with open("rf_model_review.pkl", "wb") as f:
    pickle.dump((rf_model_review,tfidf), f)