# Intro Doug Score - A look from an analytical angle

In this report, I provide a quick insight into how Doug Demuro an avid car reviewer, might be evaluating cars. This analysis starts with a look at the review data, from the 
number of brands he's reviewed, the countries of the cars the age of the car, as well as filming location. We'll take a look at all of the factors that go into the doug score based off of 
[Doug's Score Sheet](http://www.dougdemuro.com/dougscore) and our static copy [here]() used for this submission. These features listed provides the basis for his daily, weekend, and final dougscore.
Once we've had a look at the reviews and some of the light insight involved, we'll answer a few questions about the data that speak to the relationships between different features used in the review. Finally we'll use some machine learning techniques, to see if we can gain more insight into what Doug values in a car including sets of features that might see what makes a car a top dougscore car.


### Scores at a glance
### What's the score
`Total Weekend Score` + `Total Daily Score`= `Dougscore`

### Weekend Score
The weekend score consists of 5 features, that range from a score of 1-10

- Styling
- Acceleration
- Handling
- Fun Factor
- Cool Factor

### Daily Score
The daily score consists of 5 features, that range from a score of 1-10
 - Features
 - Comfort
 - Quality
 - Practical
 - Value

### What other things do we know about the reviews ?
This are not quantitative factors, but we have some additional information about hte reviews , we know the following
- Car Brand
- Model Year
- Country Filmed in
- Duration of the review
- City Filmed in
- Region Filmed in ex. State

### How's the doug score calculated
We add up all the features from both categories and get the doug score
Doug provides two categories of score, a weekend score and a daily score. The combined scores lead to a dougscore

`total weekend score = styling + acceleration + handling + fun factor + cool factor`

`total daily score = features + comfort + quality + practical + value`

`doug score = total daily score + total weekend score`

### What do we really want to know 
What influences the doug score and how!

## Data Overview 
Here in this section we will walk over what type of review data we have and some of the stats about the data that influence how we've gone about evaluating the doug score

The data at a glance 

### Brands

#### Brand distribution
The brands are not 100% evenly distributed but you can tell no one car manufacturer is massively over represented

### Locations


### Car Model Year


### Daily Score Attributes

### Weekend Score Attributes





In [3]:
import numpy as np
import pandas as pd
import seaborn as sns
import os
import matplotlib.pyplot as plt
from IPython.display import display, Markdown


In [6]:
FILE_NAME = "doug_score.csv"
FILE_PATH = os.path.join(os.getcwd(), FILE_NAME)


def create_dashboard(data):
    # Create a new figure with 2 subplots: one for histogram and one for pie chart
    fig, axs = plt.subplots(1, 2, figsize=(15, 7))

    # Histogram
    axs[0].hist(data, bins=10, color='skyblue', edgecolor='black')
    axs[0].set_title('Histogram of Data Distribution')
    axs[0].set_xlabel('Value')
    axs[0].set_ylabel('Frequency')

    # Pie chart for value counts
    unique, counts = np.unique(data, return_counts=True)
    axs[1].pie(counts, labels=unique, autopct='%1.1f%%', startangle=90, colors=plt.cm.Paired.colors)
    axs[1].set_title('Pie Chart of Value Counts')

    # Display the dashboard
    plt.tight_layout()
    plt.show()

import plotly 
import plotly.graph_objects as go
from plotly.subplots import make_subplots

def create_histogram_x(data, title): 
    fig = go.Figure()
    fig.add_trace(go.Histogram(x=data, 
                               histnorm='percent',
                               marker=dict(color='skyblue', line=dict(color='black', width=1)),
                               name=title))
    fig.update_layout(title_text=title)
    fig.show()

def create_top_n_chart(data, title, n):
    res = np.unique(data, return_counts=True)
    # sort the counts and unique and only take the top n
    sorted_data = sorted(zip(res[0], 100 * res[1].astype(int)/len(data)), key=lambda x: x[1], reverse=True)[:n]
    fig = go.Figure()
    fig.add_trace(go.Bar(x=list([x[0] for x in sorted_data]), y=list([x[1] for x in sorted_data]), 
                         marker=dict(color=plotly.colors.qualitative.Light24),
                         name=title))
    fig.update_layout(title_text=title)
    fig.show()

def create_dashboard_x(data):
    # Create a subplot with 1 row and 2 columns (one for histogram and one for pie chart)
    fig = make_subplots(rows=1, cols=2, 
                        specs=[[{"type": "histogram"}, {"type": "pie"}]],
                        subplot_titles=('Histogram of Data Distribution', 'Pie Chart of Value Counts'))

    # Histogram
    fig.add_trace(go.Histogram(x=data, 
                               marker=dict(color='skyblue', line=dict(color='black', width=1)),
                               name='Data Distribution'),
                  row=1, col=1)

    # Pie chart for value counts
    unique, counts = np.unique(data, return_counts=True)
    fig.add_trace(go.Pie(labels=unique, values=counts, 
                         marker=dict(colors=plotly.colors.qualitative.Light24),
                         name='Value Counts', hole=0.3),
                  row=1, col=2)

    # Update layout for better appearance
    fig.update_layout(title_text="Data Visualization Dashboard", showlegend=False)
    fig.show()

def load_data(file_path: str) -> pd.DataFrame:
    df = pd.read_csv(FILE_PATH)
    return df


df = load_data(FILE_PATH)


create_histogram_x(df['brand'], "Brand Distribution %")
create_top_n_chart(df['brand'], "Brand Distribution %", 20)


# one hot encoding of brand column 
# df = pd.concat([df, pd.get_dummies(df["brand"], drop_first=False).astype(int)], axis=1)