<center><h1>Cleaning Dates in R</h1></center>

# 1. The _lubridate_ Package

  - Extremely powerful R package for working with dates and timestamps
  - Part of the _tidyverse_ family of packages (e.g., _dplyr_, _ggplot_, _stringr_)

In [2]:
# load pacakges
# read in data

library(dplyr)
library(lubridate)

arrests_df <- read.csv("./data/pvd_arrests_2020-10-03.csv")

## 1.1 Working with Timestamps
  - The _lubridate_ package has many built-in functions for timestamp data
  - Also often easily recognizes when a string _is_ a timestamp

In [3]:
ts <- "2020-10-11 02:30:59"     # ISO 8601 format

year(ts)                        

In [4]:
month(ts)

In [6]:
day(ts)

### 1.1.1 Extracting Time

In [7]:
ts <- "2020-10-11 02:30:59"

hour(ts)
minute(ts)
second(ts)

In [10]:
am(ts)             # is it AM time (i.e., morning)?

dst(ts)

### 1.1.2 Extracting Day-of-Week

In [11]:
ts <- "2020-10-11 02:30:59"

wday(ts)

In [13]:
toString(wday(ts, label = TRUE))

## 1.2 Other Timestamp Formats

In [14]:
ts2 <- "2020-10-11"

toString(wday(ts2, label = TRUE))

In [21]:
ts3 <- as_datetime("20201011")

toString(wday(ts3, label = TRUE))

### 1.2.1 Non ISO 8601 Format
  - We can also tell _lubridate_ package how to parse non-obvious timestamps

In [23]:
ts3 <- "October 11, 2020"

month(ts3)             

ERROR: Error in as.POSIXlt.character(x, tz = tz(x)): character string is not in a standard unambiguous format


In [25]:
mdy(ts3)             # Month-day-year format (also dmy(), ymd(), and others)

In [26]:
month(mdy(ts3))

# 2. Math with Timestamps

  - The _lubridate_ pacakge also makes it easy to do math with dates and times

In [29]:
time1 <- as_datetime("2020-10-11 03:45:52")
time2 <- as_datetime("2020-10-13 23:41:09")

time2 - time1

Time difference of 2.830058 days

## 2.1 Date/Time Intervals

In [30]:
time1 <- as_datetime("2020-10-12")
time2 <- as_datetime("2020-10-15")


dt_intr <- interval(time1, time2)

In [37]:
as_datetime("2020-10-13") %within% dt_intr

In [38]:
now() %within% dt_intr

# 3. Arrests by Day-of-Week

  - Suppose we want to explore the number of arrests by the day of the week
  

## 3.1 Create `day_of_week()` Function

In [39]:

day_of_week <- function(timestamps) {
    
    n <- length(timestamps)  # get length of input column
    day <- rep("", n)        # allocate vector for day of week
    
    # iterate over elements of input column and return 
    # the day of the week for each timestamp
    
    for (i in 1:n) {
        day[i] <- toString(wday(timestamps[i], label = TRUE))
    }
    return(day)
}


### 3.1.1 Creating `weekday` Column
   - Now we can use our newly created `day_of_week()` function to add a new column

In [40]:
# use out `day_of_week()` function to create new column
# in our original dataframe

arrests_df$weekday <- day_of_week(arrests_df$arrest_date)

In [42]:
# use head() to examine updated dataframe

head(arrests_df)

Unnamed: 0_level_0,arrest_date,year,month,gender,race,ethnicity,year_of_birth,age,from_address,from_city,from_state,statute_type,statute_code,statute_desc,counts,case_number,arresting_officers,arrestee_id,weekday
Unnamed: 0_level_1,<chr>,<int>,<int>,<chr>,<chr>,<chr>,<int>,<int>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<int>,<chr>,<chr>,<chr>,<chr>
1,2019-08-24T02:23:00.0,2019,8,Male,White,NonHispanic,1981,37,No Permanent Address,providence,Rhode Island,,,,,2019-00084142,"YGonzalez, LTaveras",pvd2218242150382148273,Sat
2,2019-08-24T02:02:00.0,2019,8,,,,1994,25,SUMMER AVE,Cranston,Rhode Island,RI Statute Violation,31-11-18,"Driving after Denial, Suspension or Revocation of License",1.0,2019-00084127,NManfredi,pvd15166785558364246202,Sat
3,2019-08-24T02:02:00.0,2019,8,Female,Black,NonHispanic,1984,34,DOUGLAS AVE,Providence,Rhode Island,RI Statute Violation,12-7-10,RESISTING LEGAL OR ILLEGAL ARREST,1.0,2019-00084126,"MPlace, JPerez, ASantos",pvd3142917706201385905,Sat
4,2019-08-24T02:02:00.0,2019,8,Female,Black,NonHispanic,1984,34,DOUGLAS AVE,Providence,Rhode Island,RI Statute Violation,11-45-1,DISORDERLY CONDUCT,1.0,2019-00084126,"MPlace, JPerez, ASantos",pvd3142917706201385905,Sat
5,2019-08-24T02:02:00.0,2019,8,Female,Black,Unknown,2001,18,TRASH ST,,,RI Statute Violation,12-7-10,RESISTING LEGAL OR ILLEGAL ARREST,1.0,2019-00084126,"MPlace, JPerez, ASantos",pvd460449304532374599,Sat
6,2019-08-24T02:02:00.0,2019,8,Female,Black,Unknown,2001,18,TRASH ST,,,RI Statute Violation,11-45-1,DISORDERLY CONDUCT,1.0,2019-00084126,"MPlace, JPerez, ASantos",pvd460449304532374599,Sat


### 3.1.2 Counts by `weekday`

We can now obtain thee counts by day of the week using the `table()` function. We simply pass it the column of the dataframe for which we want to create a tabular summary.

In [43]:
# use table() to get counts of arrests by `weekday`

table(arrests_df$weekday)


 Fri  Mon  Sat  Sun  Thu  Tue  Wed 
1278 1164 1277 1293 1178 1323 1242 