## Working with Dates and Times in Python

Working with date and times is a vital skill, because many data include date/time information, including:

- Weather data with dates and/or times.
- Computer logs with the timestamp for each event.
- Sales data with date/time range included.

Unfortunately, working with date/time data is often a lot more complex:

- Where you have a compound date format, like January 1, 1901, separating each component value and converting it to its numeric form is cumbersome.
- There are many different formats, e.g. 12-hour time versus 24-hour time.
- Adding and subtracting across date/time boundaries isn't easy — for instance, if I wanted to add 1 hour 35 minutes to the time 32 minutes, we need to account for the fact that there are 60 minutes in an hour to be able to come up with the correct answer, 2 hours 7 minutes.

Luckily, Python comes with functionality that makes working with dates and times easier. In this mission, we'll learn this functionality while working with a data set of White House visitors.

__Data source:__ https://www.kaggle.com/somertonman/dq-lessons-potus-white-house#potus_visitors_2015.csv (which only includes visitors who met with the president in 2015)

In December 2009, the White House started publishing records of visitors to the White House. Over a seven-year span, almost six million visitor records were published. The records contain data from the __WAVES (Workers and Visitors Entry System)__ appointment system that is used to make appointments for all White House visitors, excluding staff members and other people not categorized as visitors.

The full set of records can be found on the <a href="https://obamawhitehouse.archives.gov/briefing-room/disclosures/visitor-records">Obama White House Archives Site</a>

Here are descriptions of each column:

- __name__: The name of the visitor.
- __appt_made_date__: The date and time that the appointment was created.
- __appt_start_date__: The date and time that the appointment was scheduled to start.
- __appt_end_date__: The date and time that the appointment was scheduled to end.
- __visitee_namelast__: The last name of the visitee (the person the visitor was meeting with).
- __visitee_namefirst__: The first name of the visitee.
- __meeting_room__: The room in which the appointment was scheduled.
- __description__: Optional comments added by the WAVES operator.

In this mission we'll learn techniques that will allow us to:

- Calculate the month with the most visitors.
- Calculate the most common time that visits occurred.
- Calculate summary statistics on visit length and how far ahead visits are booked.
- Produce neatly formatted summaries of daily visits.

In [1]:
from csv import reader

read_file = reader(open('data/potus_visitors_2015.csv'))
potus = list(read_file)
potus_header = potus[0]
potus_data = potus[1:]

We used the _csv_ module to make reading CSV files easier. In Python, a __module__ is simply a collection of variables, functions, and/or classes (which we'll collectively call 'definitions') that can be imported into a Python script.

Python contains many __standard modules__ that help us perform various tasks, such as performing advanced mathematical operations, working with specific file formats and databases, and working with dates and times.

Whenever we use definitions from a module, we first need to __import__ those definitions. There are a number of ways we can import modules and their definitions using the import statement:

1. __Import the whole module by name.__ This is the most common method for importing a module.
    
    import csv
    
    csv.reader()

2. __Import the whole module with an alias.__ This is especially useful if a module is long and we need to type it a lot.
    
    import csv as c
    
    c.reader()

3. __Import one or more definitions from the module by name.__ This is the technique we've used so far. This technique is useful if you want only a single or select definitions and don't want to import everything.
    
    from csv import reader
    
    reader()
    
4. __Import all definitions with a wildcard.__ This is useful if you want to import and use many definitions from a module.
    
    from csv import *
    
    reader()
    writer()
    get_dialect()

### The Datetime Module:

Python has three standard modules that are designed to help working with dates and times:

- The _calendar_ module
- The _time_ module
- The _datetime_ module

The datetime module contains a number of classes, including:

- __datetime.datetime__: For working with date and time data.
- __datetime.time__: For working with time data only.
- __datetime.timedelta__: For representing time periods.

In [2]:
import datetime as dt

The _datetime.datetime_ class is the most commonly-used class from the datetime module, and has attributes and methods designed to work with data containing both the date and time.

The signature of the class is below (with some lesser used parameters omitted):

datetime.datetime(year, month, day, hour=0, minute=0, second=0)

In [3]:
ibm_founded = dt.datetime(1911, 6, 16)
man_on_moon = dt.datetime(1969, 7, 20, 20, 17)

### Using Strptime to Parse Strings as Dates

From our POTUS dataset, the date value indicates clearly that the format is month/day/year, and additionally confirms that the time is in 24-hour format.

Using what we know so far, we could convert these values into datetime objects by manually splitting the string, converting the variables to numeric types and then instantiating a datetime object using the resultant values.

- Luckily, there is an easier way — using a special __constructor__.
    - Classes can also have additional constructors, so objects can be defined in multiple ways. The datetime class has one of these that we can use to parse dates directly from strings.

- The datetime.strptime() constructor returns a datetime object defined using a special syntax system to describe date and time formats called __strftime__.
    - The strftime syntax uses a series of format codes consisting of a % character followed by a single character which specifies a date or time part in a particular format

| Strftime Code | Meaning | Examples |
| ------------- | ------- | -------- |
| %d | Day of the month as a zero-padded number<sup>1</sup> | 04 |
| %A | Day of the week as a word<sup>2</sup> | Monday |
| %m | Month as a zero-padded number<sup>1</sup> | 09 |
| %Y | Year as a four-digit number | 1901 |
| %y | Year as a two-digit number with zero-padding<sup>1, 3</sup> | 01 (2001) <br> 88 (1988) |
| %B | Month as a word<sup>2</sup> | September |
| %H | Hour in 24 hour time as zero-padded number<sup>1</sup> | 05 (5 a.m.)<br> 15 (3 p.m.) |
| %p | a.m. or p.m.<sup>2</sup> | AM |
| %I | Hour in 12 hour time as zero-padded number<sup>1</sup> | 05 (5 a.m., or 5 p.m. if AM/PM indicates otherwise) |
| %M | Minute as a zero-padded number<sup>1</sup> | 07 |

_1. The strptime parser will parse non-zero padded numbers without raising an error._

_2. Date parts containing words will be interpreted using the locale settings on your computer, so strptime won't be able to parse "febrero" ("February" in Spanish) if your locale is set to an English language locale._

_3. Year values from 00-68 will be interpreted as 2000-2068, with values 70-99 interpreted as 1970-1999._

In [4]:
# The format of the app_start_date column is {month}/{day}/{two digit year} {hour 24hr time}:{minute}.
date_format = "%m/%d/%y %H:%M"

for row in potus_data:
    appt_start_date = row[2]  # Index for the appointment start date
    date = dt.datetime.strptime(appt_start_date, date_format)
    row[2] = date

### Using Strftime to format dates

The datetime class has a number of attributes which make it easy to retrieve the various parts that make up the date stored within the object:

- _datetime.day_: The day of the month.
- _datetime.month_: The month of the year.
- _datetime.year_: The year.
- _datetime.hour_: The hour of the day.
- _datetime.minute_: The minute of the hour.

The datetime class has a datetime.strftime() method which will return a string representation of the date using the strftime syntax.

It's easy to get strptime and strftime confused. An easy way to remember which is which is:

- strptime >> str-p-time >> string parse time
- strftime >> str-f-time >> string format time

Let's use the datetime.strftime() method to create a formatted frequency table and analyze the appointment dates in our data set. We'll:

- Iterate over each of the datetime objects we created in the previous screen.
- Create a string containing the month and year from each datetime object.
- Create a frequency table for the month/year of the appointments.

In [5]:
visitors_per_month = {}

for row in potus_data:
    appt_start_date = row[2]  # Index for the appointment start date
    date = dt.datetime.strftime(appt_start_date, "%B, %Y")
    if date not in visitors_per_month:
        visitors_per_month[date] = 1
    else:
        visitors_per_month[date] += 1

### The Time Class