### Step 1: Create a package named `DfCleaner`

### Step 2: Read the sample csv file
* You need to read the `sample_employees.csv` file as a DataFrame and test your functions

### Step 3: Create separate functions in `__init__.py`
1. import pandas
2. `drop_empty` takes pandas dataframe as an input and drops any row with empty value. Use pandas' `dropna`.
    * Use sample `DataFrame` to test
3. `fill_empty` takes pandas dataframe and column name as inputs and fills the given column's empty values using mean. Use pandas' `fillna`.
    * Use sample `DataFrame` to test
4. `drop_column` takes pandas dataframe and column name as inputs and drops the given column and returns the rest of the dataframe. Use pandas' `drop`.
    * Use sample `DataFrame` to test
5. `fix_index` takes pandas dataframe as an input and resets its index and returns it. Use pandas' `reset_index` and drop the old index column.
    * Use sample `DataFrame` to test
6. `fix_dates` takes pandas dataframe and column name as input. Changes given column's data type into datetime. Use pandas' `to_datetime` function.
    * Use sample `DataFrame` to test

### Step 4: CONVERT YOUR PACKAGE INTO A `CLASS` BASED PACKAGE
#### Create a new file called `cleaner.py` and copy paste `__init__.py`'s code.
#### Your `__init__.py` must remain empty.

1. Convert all of your package functions into class *methods*
2. Import your package like the following: `from DfCleaner.cleaner import Cleaner`
3. Test your class methods using `sample_employees.csv` file




In [1]:
import DfCleaner
import pandas as pd

In [2]:
df = pd.read_csv("sample_employees.csv")

In [3]:
df

Unnamed: 0,first_name,last_name,email,gender,employed_date,department,salary,bonus
0,Rosamond,Dourin,rdourin0@xing.com,Female,,Business Development,91048.0,10.0
1,Mira,Giamo,mgiamo1@ameblo.jp,Female,,Legal,146665.0,2.0
2,August,Nelmes,anelmes2@bbb.org,Male,9/7/2014,Accounting,161814.0,14.0
3,Carla,Franzotto,,Female,6/19/2015,Human Resources,168816.0,8.0
4,Paxon,Partrick,ppartrick4@drupal.org,Male,4/15/2018,Accounting,136121.0,
...,...,...,...,...,...,...,...,...
995,Cele,Drennan,cdrennanrn@google.co.uk,Female,,Product Management,144649.0,
996,Cristy,Nortunen,cnortunenro@freewebs.com,Female,1/28/2016,Engineering,120389.0,2.0
997,Grete,Elcoate,gelcoaterp@merriam-webster.com,Female,5/25/2017,Support,86544.0,
998,Wilfrid,Canadas,,Male,12/19/2018,Marketing,92586.0,


In [5]:
# These functions will be moved to __init__.py in DfCleaner later

In [6]:
# This is the 1st function
def drop_empty(df):
    return df.dropna()

In [7]:
len(df)

1000

In [8]:
len(drop_empty(df))

464

In [24]:
df = drop_empty(df)

In [29]:
for i in range(len(df)):
    print(df['first_name'][i])

KeyError: 0

In [16]:
# This is the 2nd function
def fill_empty(df, column_name):
    return df[column_name].fillna(df[column_name].mean())

In [18]:
fill_empty(df, 'bonus')

0      10.000000
1       2.000000
2      14.000000
3       8.000000
4       7.699856
         ...    
995     7.699856
996     2.000000
997     7.699856
998     7.699856
999     3.000000
Name: bonus, Length: 1000, dtype: float64

In [21]:
# 3rd function
def drop_column(df, column_name):
    return df.drop(columns=column_name)

In [22]:
# 4th function
def fix_index(df):
    return df.reset_index(drop=True)

In [33]:
df = fix_index(df)

In [34]:
for i in range(len(df)):
    print(df['first_name'][i])

August
Whittaker
Timothee
Shell
Dill
Biddy
Rawley
Chrissy
Randie
Alicea
Hilliard
Sisile
Halie
Kailey
Teressa
Konstantin
Conway
Marcia
Virgilio
Marie-ann
Troy
Kaylil
Chiarra
Natasha
Gram
Tilda
Easter
Georges
Dodi
Flinn
Lorens
Jonathon
Fredra
Horatio
Anet
Lanni
Thorpe
Teodora
Alick
Crissy
Vinny
Kimberly
Brenna
Granny
Camella
Daile
Free
Dilan
Tessi
Mellie
Christian
Elianore
Sutton
Philippine
Thatch
Giacinta
Val
Leoine
Paige
Jemimah
Erin
Fletcher
Gawain
Danica
Justine
Isadore
Hart
Hoyt
Papageno
Laureen
Jacinthe
Noble
Huberto
Rodolph
Ellwood
Carmencita
Nannie
Erin
Kip
Solomon
Eloisa
Nap
Meredith
Briana
Bondy
Barny
Albert
Dolf
Cacilia
Ellie
Hubey
Laurent
Francis
Mehetabel
Stanislaus
Erny
Elnore
Leanora
Orrin
Roxane
Frederica
Moises
Jocelyne
Kiersten
Rivkah
Farlay
Chan
Rachel
Michaella
Willabella
Montague
Elie
Gayleen
Mei
Hermia
Kurtis
Tisha
Niccolo
Damita
Cosmo
Karin
Isidor
Godard
Andriette
Euell
Derwin
Hortensia
Marcelline
Sherry
Bogey
Paulita
Avrom
Chloette
Rhett
Lyndsey
Gregor
Sarah
Willo

In [40]:
# 5th function
def fix_dates(df, column_name):
    # return df[column_name].to_datetime()
    return pd.to_datetime(df[column_name])

In [41]:
fix_dates(df, 'employed_date')

0     2014-09-07
1     2018-04-13
2     2013-09-05
3     2016-02-27
4     2018-02-19
         ...    
459   2016-05-09
460   2017-04-24
461   2018-08-15
462   2016-01-28
463   2013-03-27
Name: employed_date, Length: 464, dtype: datetime64[ns]

In [4]:
# TESTING PACKAGE
df = DfCleaner.drop_empty(df)

In [6]:
df['bonus'] = DfCleaner.fill_empty(df, 'bonus')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['bonus'] = DfCleaner.fill_empty(df, 'bonus')


In [8]:
df = DfCleaner.drop_column(df, 'department')

In [10]:
df = DfCleaner.fix_index(df)

In [11]:
df['employed_date'] = DfCleaner.fix_dates(df, 'employed_date')

In [12]:
df

Unnamed: 0,first_name,last_name,email,gender,employed_date,salary,bonus
0,August,Nelmes,anelmes2@bbb.org,Male,2014-09-07,161814.0,14.0
1,Whittaker,Cluet,wcluet6@angelfire.com,Male,2018-04-13,114440.0,11.0
2,Timothee,McVane,tmcvane8@opensource.org,Male,2013-09-05,117430.0,8.0
3,Shell,Zecchinelli,szecchinellib@blogtalkradio.com,Male,2016-02-27,154034.0,13.0
4,Dill,Iglesia,diglesiac@g.co,Male,2018-02-19,64334.0,4.0
...,...,...,...,...,...,...,...
459,Theobald,Hatley,thatleyri@state.tx.us,Male,2016-05-09,175423.0,12.0
460,Franky,McDell,fmcdellrk@pinterest.com,Female,2017-04-24,85666.0,12.0
461,Maurine,Greeson,mgreesonrl@census.gov,Female,2018-08-15,95521.0,7.0
462,Cristy,Nortunen,cnortunenro@freewebs.com,Female,2016-01-28,120389.0,2.0


In [3]:
import pandas as pd
from DfCleaner.cleaner import Cleaner

In [4]:
df = pd.read_csv("sample_employees.csv")
new_df = Cleaner(df)

In [5]:
new_df.drop_empty()

In [6]:
new_df.df

Unnamed: 0,first_name,last_name,email,gender,employed_date,department,salary,bonus
2,August,Nelmes,anelmes2@bbb.org,Male,9/7/2014,Accounting,161814.0,14.0
6,Whittaker,Cluet,wcluet6@angelfire.com,Male,4/13/2018,Services,114440.0,11.0
8,Timothee,McVane,tmcvane8@opensource.org,Male,9/5/2013,Support,117430.0,8.0
11,Shell,Zecchinelli,szecchinellib@blogtalkradio.com,Male,2/27/2016,Product Management,154034.0,13.0
12,Dill,Iglesia,diglesiac@g.co,Male,2/19/2018,Support,64334.0,4.0
...,...,...,...,...,...,...,...,...
990,Theobald,Hatley,thatleyri@state.tx.us,Male,5/9/2016,Human Resources,175423.0,12.0
992,Franky,McDell,fmcdellrk@pinterest.com,Female,4/24/2017,Business Development,85666.0,12.0
993,Maurine,Greeson,mgreesonrl@census.gov,Female,8/15/2018,Research and Development,95521.0,7.0
996,Cristy,Nortunen,cnortunenro@freewebs.com,Female,1/28/2016,Engineering,120389.0,2.0


In [8]:
new_df.fill_empty('bonus')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.df[column_name] = self.df[column_name].fillna(self.df[column_name].mean())


In [10]:
new_df.drop_column('bonus')

In [11]:
new_df.df

Unnamed: 0,first_name,last_name,email,gender,employed_date,department,salary
2,August,Nelmes,anelmes2@bbb.org,Male,9/7/2014,Accounting,161814.0
6,Whittaker,Cluet,wcluet6@angelfire.com,Male,4/13/2018,Services,114440.0
8,Timothee,McVane,tmcvane8@opensource.org,Male,9/5/2013,Support,117430.0
11,Shell,Zecchinelli,szecchinellib@blogtalkradio.com,Male,2/27/2016,Product Management,154034.0
12,Dill,Iglesia,diglesiac@g.co,Male,2/19/2018,Support,64334.0
...,...,...,...,...,...,...,...
990,Theobald,Hatley,thatleyri@state.tx.us,Male,5/9/2016,Human Resources,175423.0
992,Franky,McDell,fmcdellrk@pinterest.com,Female,4/24/2017,Business Development,85666.0
993,Maurine,Greeson,mgreesonrl@census.gov,Female,8/15/2018,Research and Development,95521.0
996,Cristy,Nortunen,cnortunenro@freewebs.com,Female,1/28/2016,Engineering,120389.0
