https://towardsdatascience.com/3-practical-differences-between-astype-and-to-datetime-in-pandas-fe2c0bfc7678

In [7]:
import pandas as pd
df = pd.read_csv("Dummy_dates_sales.csv")
df.head()

Unnamed: 0,Dates,Sales
0,2020-10-09 23:58:40+00:00,14164
1,2018-02-13 20:37:30+00:00,542
2,2022-11-19 05:45:14+00:00,7190
3,2020-04-03 23:21:34+00:00,1133
4,2022-09-12 05:36:48+00:00,2612


In [10]:
%%timeit
df["Dates"].astype("datetime64[ns, UTC]")

565 µs ± 12.7 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


In [11]:
%%timeit
pd.to_datetime(df.Dates)

811 µs ± 23.6 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


In [12]:
df = pd.DataFrame({"Dates": ["2022-12-25", "2021-12-01", "2022-08-30"]})

In [13]:
df["NewDate_using_astype()"] = df["Dates"].astype("datetime64[ns]")
df["NewDate_using_to_datetime()"] = pd.to_datetime(df["Dates"])

df.info()
df.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column                       Non-Null Count  Dtype         
---  ------                       --------------  -----         
 0   Dates                        3 non-null      object        
 1   NewDate_using_astype()       3 non-null      datetime64[ns]
 2   NewDate_using_to_datetime()  3 non-null      datetime64[ns]
dtypes: datetime64[ns](2), object(1)
memory usage: 204.0+ bytes


Unnamed: 0,Dates,NewDate_using_astype(),NewDate_using_to_datetime()
0,2022-12-25,2022-12-25,2022-12-25
1,2021-12-01,2021-12-01,2021-12-01
2,2022-08-30,2022-08-30,2022-08-30


In [14]:
df = pd.DataFrame({"Dates": ["2022-25-12", "2021-01-12", "2022-30-08"]})
df["NewDate_using_astype()"] = df["Dates"].astype("datetime64[ns]")

  df["NewDate_using_astype()"] = df["Dates"].astype("datetime64[ns]")


DateParseError: month must be in 1..12: 2022-25-12, at position 0

In [15]:
df = pd.DataFrame({"Dates": ["2022-25-12", "2021-01-12", "2022-30-08"]})
df["NewDate_using_to_datetime()"] = pd.to_datetime(df["Dates"],
                                                   format='%Y-%d-%m')
df.info()
df.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
 #   Column                       Non-Null Count  Dtype         
---  ------                       --------------  -----         
 0   Dates                        3 non-null      object        
 1   NewDate_using_to_datetime()  3 non-null      datetime64[ns]
dtypes: datetime64[ns](1), object(1)
memory usage: 180.0+ bytes


Unnamed: 0,Dates,NewDate_using_to_datetime()
0,2022-25-12,2022-12-25
1,2021-01-12,2021-12-01
2,2022-30-08,2022-08-30


In [16]:
df1 = pd.DataFrame({"Dates": ["2022-12-25", "2021-12-20", "2022-12-b", "2023-07-15", "2020- -31"]})
df1.info()
df1.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Dates   5 non-null      object
dtypes: object(1)
memory usage: 172.0+ bytes


Unnamed: 0,Dates
0,2022-12-25
1,2021-12-20
2,2022-12-b
3,2023-07-15
4,2020- -31


In [17]:
# Using pandas.Series.as_type()
df1["Dates"] = df1["Dates"].astype("datetime64[ns]")
df1.info()

ValueError: time data "2022-12-b" doesn't match format "%Y-%m-%d", at position 2. You might want to try:
    - passing `format` if your strings have a consistent format;
    - passing `format='ISO8601'` if your strings are all ISO8601 but not necessarily in exactly the same format;
    - passing `format='mixed'`, and the format will be inferred for each element individually. You might want to use `dayfirst` alongside this.

In [18]:
# Using pandas.to_datetime()
df1["Dates"] = pd.to_datetime(df1["Dates"])
df1.info()

ValueError: time data "2022-12-b" doesn't match format "%Y-%m-%d", at position 2. You might want to try:
    - passing `format` if your strings have a consistent format;
    - passing `format='ISO8601'` if your strings are all ISO8601 but not necessarily in exactly the same format;
    - passing `format='mixed'`, and the format will be inferred for each element individually. You might want to use `dayfirst` alongside this.

In [19]:
df1["Dates-astype-ignore"] = df1["Dates"].astype("datetime64[ns]", errors='ignore')
df1["Dates-to_datetime-ignore"] = pd.to_datetime(df1["Dates"], errors='ignore')

df1.info()
df1.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
 #   Column                    Non-Null Count  Dtype 
---  ------                    --------------  ----- 
 0   Dates                     5 non-null      object
 1   Dates-astype-ignore       5 non-null      object
 2   Dates-to_datetime-ignore  5 non-null      object
dtypes: object(3)
memory usage: 252.0+ bytes


Unnamed: 0,Dates,Dates-astype-ignore,Dates-to_datetime-ignore
0,2022-12-25,2022-12-25,2022-12-25
1,2021-12-20,2021-12-20,2021-12-20
2,2022-12-b,2022-12-b,2022-12-b
3,2023-07-15,2023-07-15,2023-07-15
4,2020- -31,2020- -31,2020- -31


In [20]:
df1["Dates-to_datetime-coerce"] = pd.to_datetime(df1["Dates"], errors='coerce')
df1.info()
df1.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 4 columns):
 #   Column                    Non-Null Count  Dtype         
---  ------                    --------------  -----         
 0   Dates                     5 non-null      object        
 1   Dates-astype-ignore       5 non-null      object        
 2   Dates-to_datetime-ignore  5 non-null      object        
 3   Dates-to_datetime-coerce  3 non-null      datetime64[ns]
dtypes: datetime64[ns](1), object(3)
memory usage: 292.0+ bytes


Unnamed: 0,Dates,Dates-astype-ignore,Dates-to_datetime-ignore,Dates-to_datetime-coerce
0,2022-12-25,2022-12-25,2022-12-25,2022-12-25
1,2021-12-20,2021-12-20,2021-12-20,2021-12-20
2,2022-12-b,2022-12-b,2022-12-b,NaT
3,2023-07-15,2023-07-15,2023-07-15,2023-07-15
4,2020- -31,2020- -31,2020- -31,NaT
