# DATA ANALYSIS - AMAZON REVIEWS
____
by Vitor Flisch Cavalanti<br>
May 2021

<b>Case study Sr. Business Analyst</b>

Please present your analysis and actionable insights, as if you were to present to the management of Amazon. The "Actionable business insights" should be based on data, an example of an analytical approach is as follows:
 
1) What is the relation between the reviews and the helpfulness?<br>
2) What is the review behavior among different categories?<br>
3) Is there a relationship between price and reviews?<br>
4) Which group of reviewers is more valuable to the business?<br>
5) Is there a relation between reviews from products which are bought together?<br>
6) Optional - Any other hypothesis you think is interesting, as long it would have business value for Amazon

In [1]:
import pandas as pd
import numpy as np 
import matplotlib.pyplot as plt

In [2]:
# read reviews
reviews = pd.read_csv("../exports/final/reviews_concatenated.csv")

In [3]:
# read metadata
metadata = pd.read_csv("../exports/final/metadata_final.csv")

  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,


In [4]:
# read metadata_related
metadata_related = pd.read_csv("../exports/final/metadata_related.csv")

In [5]:
merge = pd.merge(reviews, metadata, on="asin", how="inner")

<hr><h3>1) What is the relation between the reviews and the helpfulness?</h3><hr>

In [41]:
corr = reviews[~reviews.helpful_score.isna()][['helpful_score','overall']].corr(method='pearson')
corr.style.background_gradient()

Unnamed: 0,helpful_score,overall
helpful_score,1.0,0.269365
overall,0.269365,1.0


In [46]:
# correlation by group of products
merge[~merge.helpful_score.isna()].groupby('file')[['helpful_score','overall']].corr(method='pearson').unstack().iloc[:,1]

file
reviews_Books_5.csv            0.259805
reviews_Electronics_5.csv      0.254737
reviews_Movies_and_TV_5.csv    0.297498
Name: (helpful_score, overall), dtype: float64

In [7]:
corr = reviews[~reviews.helpful_score.isna()][['helpful_score','overall']].corr(method='spearman')
corr.style.background_gradient()

Unnamed: 0,helpful_score,overall
helpful_score,1.0,0.268791
overall,0.268791,1.0


In [45]:
# correlation by group of products
merge[~merge.helpful_score.isna()].groupby('file')[['helpful_score','overall']].corr(method='spearman').unstack().iloc[:,1]

file
reviews_Books_5.csv            0.261090
reviews_Electronics_5.csv      0.260054
reviews_Movies_and_TV_5.csv    0.288505
Name: (helpful_score, overall), dtype: float64

In [8]:
# Removing unused columns
merge.drop(["Unnamed: 0_x","Unnamed: 0.1_x","Unnamed: 0_y","Unnamed: 0.1_y"], axis="columns", inplace=True)

In [9]:
# Correlation Matrix
corr = merge.corr()
corr.style.background_gradient()

Unnamed: 0,overall,helpful_score,price,salesrank_value
overall,1.0,0.269364,0.00437,-0.016917
helpful_score,0.269364,1.0,0.007054,0.053198
price,0.00437,0.007054,1.0,-0.109602
salesrank_value,-0.016917,0.053198,-0.109602,1.0


<hr><h3>3) Is there a relationship between price and reviews?</h3><hr>

In [39]:
corr = merge[['price','overall']].corr(method='pearson')
corr.style.background_gradient()

Unnamed: 0,price,overall
price,1.0,0.00437
overall,0.00437,1.0


In [61]:
# correlation by group of products
merge.groupby('file')[['price','overall']].corr(method='pearson').unstack().iloc[:,1]

file
reviews_Books_5.csv            0.005199
reviews_Electronics_5.csv      0.009020
reviews_Movies_and_TV_5.csv    0.064235
Name: (price, overall), dtype: float64

In [11]:
corr = merge[['price','overall']].corr(method='spearman')
corr.style.background_gradient()

Unnamed: 0,price,overall
price,1.0,0.00205
overall,0.00205,1.0


In [48]:
# correlation by group of products
merge.groupby('file')[['price','overall']].corr(method='spearman').unstack().iloc[:,1]

file
reviews_Books_5.csv           -0.000627
reviews_Electronics_5.csv     -0.027847
reviews_Movies_and_TV_5.csv    0.065654
Name: (price, overall), dtype: float64

<hr><h3>5) Is there a relation between reviews from products which are bought together?</h3><hr>

In [52]:
metadata_related.rename(columns = {'asin_prod': 'asin'}, inplace = True)

In [53]:
metadata_related_join = pd.merge(metadata,metadata_related, on="asin", how="inner")

In [55]:
corr = metadata_related[['overall_prod','overall_bought_together']].corr(method='pearson')
corr.style.background_gradient()

Unnamed: 0,overall_prod,overall_bought_together
overall_prod,1.0,0.197802
overall_bought_together,0.197802,1.0


In [56]:
# pearson correlation by product group
metadata_related_join.groupby('file')[['overall_prod','overall_bought_together']].corr(method='pearson').unstack().iloc[:,1]

file
reviews_Books_5.csv            0.185315
reviews_Electronics_5.csv      0.131910
reviews_Movies_and_TV_5.csv    0.278140
Name: (overall_prod, overall_bought_together), dtype: float64

In [13]:
corr = metadata_related[['overall_prod','overall_bought_together']].corr(method='spearman')
corr.style.background_gradient()

Unnamed: 0,overall_prod,overall_bought_together
overall_prod,1.0,0.214458
overall_bought_together,0.214458,1.0


In [57]:
# spearman correlation by product group
metadata_related_join.groupby('file')[['overall_prod','overall_bought_together']].corr(method='spearman').unstack().iloc[:,1]

file
reviews_Books_5.csv            0.205445
reviews_Electronics_5.csv      0.148525
reviews_Movies_and_TV_5.csv    0.285696
Name: (overall_prod, overall_bought_together), dtype: float64