## {{cookiecutter.project_name}}

{{cookiecutter.description}}

### Data Sources
- file1 : Description of where this file came from

### Changes
- {% now 'utc', '%m-%d-%Y' %} : Started project

In [1]:
import sys
import pandas as pd
import numpy as np
from pathlib import Path
from datetime import datetime
import seaborn as sns
import matplotlib.pyplot as plt

In [None]:
# Add the utilities folder to PATH to support sibling imports in the utility modules,
# where they just import by e.g. "import sibling_module_name"
sys.path.append('./utilities/')

# utilities prefix not needed now that the utility folder is added to PATH,
# but we'll include it here for clarity
from utilities import dataframe_operations as u_do
from utilities import column_name_mapping as u_cnm
from utilities import data_types as u_dt

### File Locations

In [None]:
today = datetime.today()
in_file = Path.cwd() / "data" / "raw" / "FILE1"
summary_file = Path.cwd() / "data" / "processed" / f"summary_{today:%b-%d-%Y}.pkl"

In [None]:
df = pd.read_csv(in_file)

### Early mergers

In [None]:
today = datetime.today()
in_file = Path.cwd() / "data" / "raw" / "FILE1"
summary_file = Path.cwd() / "data" / "processed" / f"summary_{today:%b-%d-%Y}.pkl"

### Column Cleanup

- Remove all leading and trailing spaces
- Rename the columns for consistency.

In [None]:
# https://stackoverflow.com/questions/30763351/removing-space-in-dataframe-python
df.columns = [x.strip() for x in df.columns]

In [None]:
name_mapper = u_cnm.default_name_mapper(df)
# name_mapper.print()

In [None]:
# To use if renaming several columns:
# max_len = max([len("'" + x + "': ") for x in df.columns])
# for col in df.columns:
#     print(cnm.fixed_width_formatter(max_len).format("'" + col + "':") +  "'',")

cols_to_rename = {}
name_mapper.add_dict(cols_to_rename)
df.columns = name_mapper.map_column_names(df)

### Clean Up Data Types

In [None]:
dtype_summary = u_dt.dtype_summarizer(df)

In [None]:
dtype_summary.print_type('object')

Comment on results

<div style="padding:10px;border-style:double;border-color:orange"><b><u>Actions</u></b>

- action one

- ...
</div>

<div style="padding:10px;border-style:double;border-color:green"><b><u>Summary, data type actions</u></b>

- action one

- ...
</div>

### Outliers

<i>Using boxplots is a simple way of spotting outliers. For an explanation of the outlier classification criteria, see e.g. https://stats.stackexchange.com/questions/149161/confused-by-location-of-fences-in-box-whisker-plots/149178#149178</i>

### Missing values

### Data Manipulation

### Save output file into processed directory

Save a file in the processed directory that is cleaned properly. It will be read in and used later for further analysis.

Other options besides pickle include:
- feather
- msgpack
- parquet

In [None]:
df.to_pickle(summary_file)