# Tricks to parse date columns with Pandas read_csv()

This is a notebook for the medium article [4 tricks you should know to parse date columns with Pandas read_csv()](https://towardsdatascience.com/4-tricks-you-should-know-to-parse-date-columns-with-pandas-read-csv-27355bb2ad0e)

Please check out article for instructions

**License**: [BSD 2-Clause](https://opensource.org/licenses/BSD-2-Clause)

In [3]:
import pandas as pd

### 1. Reading date columns from a CSV file

In [4]:
# Date columns will be represented as an object by default
df = pd.read_csv('data/data_1.csv')
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   date     3 non-null      object
 1   product  3 non-null      object
 2   price    3 non-null      int64 
dtypes: int64(1), object(2)
memory usage: 200.0+ bytes


In [5]:
df = pd.read_csv('data/data_1.csv', parse_dates=['date'])
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype         
---  ------   --------------  -----         
 0   date     3 non-null      datetime64[ns]
 1   product  3 non-null      object        
 2   price    3 non-null      int64         
dtypes: datetime64[ns](1), int64(1), object(1)
memory usage: 200.0+ bytes


In [6]:
df

Unnamed: 0,date,product,price
0,2019-01-01,A,10
1,2020-01-02,B,20
2,1998-01-03,C,30


### 2. Day first input

In [7]:
pd.read_csv('data/data_1.csv', 
            parse_dates=['date'], 
            dayfirst=True)

Unnamed: 0,date,product,price
0,2019-01-01,A,10
1,2020-02-01,B,20
2,1998-03-01,C,30


### 3. Combining multiple columns to a datetime

In [8]:
df = pd.read_csv('data/data_2.csv',
                 parse_dates=[['year', 'month', 'day']])
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 3 columns):
 #   Column          Non-Null Count  Dtype         
---  ------          --------------  -----         
 0   year_month_day  4 non-null      datetime64[ns]
 1   product         4 non-null      object        
 2   price           4 non-null      int64         
dtypes: datetime64[ns](1), int64(1), object(1)
memory usage: 224.0+ bytes


In [9]:
# custom column name
df = pd.read_csv('data/data_2.csv',
                 parse_dates={ 'date': ['year', 'month', 'day'] })
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype         
---  ------   --------------  -----         
 0   date     4 non-null      datetime64[ns]
 1   product  4 non-null      object        
 2   price    4 non-null      int64         
dtypes: datetime64[ns](1), int64(1), object(1)
memory usage: 224.0+ bytes


### 4. Customizing date parser

In [10]:
# write your own parser to support a different date format, 
# for example, YYYY DD MM HH:MM:SS

from datetime import datetime

custom_date_parser = lambda x: datetime.strptime(x, "%Y-%d-%m %H:%M:%S")

df = pd.read_csv('data/data_3.csv',
                 parse_dates=['date'],
                date_parser=custom_date_parser)
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype         
---  ------   --------------  -----         
 0   date     3 non-null      datetime64[ns]
 1   product  3 non-null      object        
 2   price    3 non-null      int64         
dtypes: datetime64[ns](1), int64(1), object(1)
memory usage: 200.0+ bytes


In [11]:
df

Unnamed: 0,date,product,price
0,2016-10-06 20:30:00,A,10
1,2016-01-07 19:45:30,B,20
2,2013-12-10 04:05:01,C,20
