# <h1 align="center">Drug Overdose Deaths Analysis Part.3</h1>

# Introduction <small id='intro'></small>

Welcome to the continuation of our journey with the `drug_deaths` dataset. In this notebook, we will explore several feature transformation techniques, including **OneHotEncoder**, **BinaryEncoder** and handling missing values in the data through dropping or filling. Additionally, also we will cover feature scaling methods to ensure the data is appropriately scaled for the processing stage.

## Overview
1. [Introduction](#intro)
1. [Imports](#imports)
1. [Load the Data](#load_the_data)
1. [Check Missing values](#missing_values)
1. [Dealing with Missing values](#dealing)
1. [Encoding Process](#encoding)
1. [Split Data to Train and Test sets](#split)
1. [Scaling Process](#scaling)
1. [Summary](#summarize)

# Imports <small id='imports'></small>

In [None]:
import pandas as pd

from sklearn.preprocessing import OneHotEncoder

from category_encoders import BinaryEncoder

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import RobustScaler

# Ignore warnings
import warnings
warnings.filterwarnings('ignore')

# Enable multiple cell outputs
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'

# Load the Data <small id='load_the_data'></small>

In [None]:
df = pd.read_csv('Drug_deaths_pt2.csv', dtype={'Year': str})

# Check Missing values <small id='missing_values'></small>

In [None]:
df.isnull().sum()
df.isnull().mean() *100

# Dealing with Missing values <small id='dealing'></small>

1. [DateType](#datetype)
1. [Age](#age)
1. [Sex](#sex)
1. [Race](#race)
1. [Location](#location)
1. [DescriptionofInjury](#descofinjry)
1. [InjuryPlace](#injuryplace)
1. [OtherSignifican](#othersigns)
1. [Other](#other)
1. [MannerofDeath](#manner)

###  - `DateType` column <small id='datetype'></small>

In [None]:
df.dropna(axis=0, subset='DateType', inplace=True)

###  - `Age` column <small id='age'></small>

In [None]:
df.dropna(axis=0, subset='Age', inplace=True)

###  - `Sex` column <small id='sex'></small>

In [None]:
df['Sex'].fillna('Unknown', inplace=True)

###  - `Race` column <small id='race'></small>

In [None]:
df['Race'].fillna('Unknown', inplace=True)

###  - `Location` column <small id='location'></small>

In [None]:
df.dropna(axis=0, subset='Location', inplace=True)

###  - `DescriptionofInjury` column <small id='descofinjry'></small>

In [None]:
df['DescriptionofInjury'].fillna('No Description', inplace=True)

###  - `InjuryPlace` column <small id='injuryplace'></small>

In [None]:
df['InjuryPlace'].fillna('other', inplace=True)

###  - `OtherSignifican` column <small id='othersigns'></small>

In [None]:
df['OtherSignifican'].fillna('No Sign', inplace=True)

###  - `Other` column <small id='other'></small>

In [None]:
df['Other'].fillna('others', inplace=True)

###  - `MannerofDeath` column <small id='manner'></small>

In [None]:
df['MannerofDeath'].fillna('Unknown', inplace=True)

# Encoding Process <small id='encoding'></small>

In [None]:
def encode_and_concat(df, column):
    # Create an instance of BinaryEncoder
    encoder = BinaryEncoder()
    
    # Transform the specified column using BinaryEncoder
    transformed_df = encoder.fit_transform(df[[column]])
    
    # Concatenate the transformed column with the original DataFrame
    df = pd.concat([df, transformed_df], axis=1)
    
    # Drop the original categorical column
    df.drop(column, axis=1, inplace=True)
    
    # Return the updated DataFrame
    return df

In [None]:
def encode_all(df, lst):
    # Iterate over the list of column names
    for i in range(len(lst)):
        # Encode and concatenate each column using the encode_and_concat function
        df = encode_and_concat(df, column=lst[i])
    
    # Return the updated DataFrame
    return df

In [None]:
# preparing the columns names and place them in 'lstcols' list
lst_cols = df.columns[2:17].tolist()
lst_cols.append('Other')
lst_cols = lst_cols+df.columns[34:].tolist()

In [None]:
# Appling the encode_and_concat function
df = encode_all(df, lst_cols)

# Split Data to Train and Test sets <small id='split'></small>

In [None]:
x = df.drop('AnyOpioid', axis=1)
y = df['AnyOpioid']

In [None]:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=1)

In [None]:
x_train

In [None]:
y_train

# Scaling Process <small id='scaling'></small>

In [None]:
scaler = RobustScaler()

In [None]:
x_train[['Age']] = scaler.fit_transform(x_train[['Age']])

In [None]:
x_train

# Summary <small id='summarize'></small>

**Analyzing Drug Overdose Deaths Dataset**

In this analysis journey, we explored the "Drug Overdose Deaths" dataset, focusing on various data preprocessing and exploratory data analysis techniques. The following steps were performed to gain insights from the dataset:

1. **Data Understanding:** We examined the structure and content of the dataset, including the columns, data types, and any missing values.

2. **Univariate Analysis:** We performed an examination of individual variables in the dataset. This analysis involved analyzing the distribution, identifying outliers, and calculating summary statistics. Visualizations like histograms and bar plots were utilized to gain insights into each variable.

3. **Bivariate Analysis:** We explored relationships between pairs of variables. This analysis allowed us to understand dependencies, correlations, or associations between different features. Techniques such as scatter plots and correlation matrices were employed to investigate these relationships.

4. **Data Cleaning:** We analyzed the dataset for missing values and applied suitable strategies to handle them. Missing values were either dropped or filled using appropriate imputation techniques.

5. **Answering Questions:** Throughout the analysis, we formulated specific questions related to the dataset and provided answers based on our findings. We supported our conclusions with appropriate statistical techniques and visualizations.

3. **Data Encoding:** Categorical or textual columns in the dataset were encoded to numerical values. This enabled us to incorporate these features in further analysis by transforming them into suitable representations.

4. **Feature Scaling:** To ensure fair comparisons and prevent certain features from dominating others, we applied feature scaling techniques. This step standardized the features, making them comparable on a common scale.


In conclusion, this analysis of the "Drug Overdose Deaths" dataset uncovered valuable insights about drug-related fatalities. By performing both univariate and bivariate analyses, addressing missing values, encoding categorical data and  scaling features. we gained a comprehensive understanding of the dataset. The findings obtained can be used to inform future research, policy-making, and interventions related to drug overdose prevention and public health.

# There you go!