# DATA ANALYSIS - AMAZON REVIEWS
____
by Vitor Flisch Cavalanti<br>
May 2021

<b>Case study Sr. Business Analyst</b>

Please present your analysis and actionable insights, as if you were to present to the management of Amazon. The "Actionable business insights" should be based on data, an example of an analytical approach is as follows:
 
1) What is the relation between the reviews and the helpfulness?<br>
2) What is the review behavior among different categories?<br>
3) Is there a relationship between price and reviews?<br>
4) Which group of reviewers is more valuable to the business?<br>
5) Is there a relation between reviews from products which are bought together?<br>
6) Optional - Any other hypothesis you think is interesting, as long it would have business value for Amazon

In [1]:
import pandas as pd
import numpy as np 
import matplotlib.pyplot as plt

In [2]:
# read reviews
reviews = pd.read_csv("../exports/final/reviews_concatenated.csv")

In [3]:
# read metadata
metadata = pd.read_csv("../exports/final/metadata_final.csv")

In [4]:
# read metadata_related
metadata_related = pd.read_csv("../exports/final/metadata_related.csv")

In [5]:
merge = pd.merge(reviews, metadata, on="asin", how="inner")

<hr><h3>1) What is the relation between the reviews and the helpfulness?</h3><hr>

In [6]:
corr = reviews[~reviews.helpful_score.isna()][['helpful_score','overall']].corr(method='pearson')
corr.style.background_gradient()

Unnamed: 0,helpful_score,overall
helpful_score,1.0,0.289697
overall,0.289697,1.0


In [7]:
corr = reviews[~reviews.helpful_score.isna()][['helpful_score','overall']].corr(method='spearman')
corr.style.background_gradient()

Unnamed: 0,helpful_score,overall
helpful_score,1.0,0.284162
overall,0.284162,1.0


In [8]:
# Removing unused columns
merge.drop(["Unnamed: 0_x","Unnamed: 0.1_x","Unnamed: 0_y","Unnamed: 0.1_y"], axis="columns", inplace=True)

In [9]:
# Correlation Matrix
corr = merge.corr()
corr.style.background_gradient()

Unnamed: 0,overall,helpful_score,price,salesrank_value
overall,1.0,0.289697,0.034834,-0.032569
helpful_score,0.289697,1.0,0.062004,-0.001457
price,0.034834,0.062004,1.0,-0.178511
salesrank_value,-0.032569,-0.001457,-0.178511,1.0


<hr><h3>3) Is there a relationship between price and reviews?</h3><hr>

In [10]:
corr = merge[['price','overall']].corr(method='pearson')
corr.style.background_gradient()

Unnamed: 0,price,overall
price,1.0,0.034834
overall,0.034834,1.0


In [11]:
corr = merge[['price','overall']].corr(method='spearman')
corr.style.background_gradient()

Unnamed: 0,price,overall
price,1.0,0.052551
overall,0.052551,1.0


<hr><h3>5) Is there a relation between reviews from products which are bought together?</h3><hr>

In [12]:
corr = metadata_related[['overall_prod','overall_bought_together']].corr(method='pearson')
corr.style.background_gradient()

Unnamed: 0,overall_prod,overall_bought_together
overall_prod,1.0,0.219566
overall_bought_together,0.219566,1.0


In [13]:
corr = metadata_related[['overall_prod','overall_bought_together']].corr(method='spearman')
corr.style.background_gradient()

Unnamed: 0,overall_prod,overall_bought_together
overall_prod,1.0,0.230191
overall_bought_together,0.230191,1.0
