# Step 1: Introduction

I use the popular ToDoist application to track all of my open tasks. ToDoist has many strengths - e.x. a clean user interface, presence on every platform, and gamification features (ToDoist Karma). One of its main flaws, however, is the omission of useful tools that could be used to track productivity.

The purpose of this analysis is to use the ToDoist API to derive insights about how to improve my productivity. Some of the questions investigated are the following: 

- How many tasks were completed on a weekly basis?
- What is the breakdown of completed tasks by required energy level?
- Is there a correlation between number of total tasks completed and required energy level?
- How many professional project tasks contain the word 'meeting' or 'ship'? 

# Step 2: Data Wrangling

In [None]:
import pandas as pd
import numpy as np
import datetime
from todoist.api import TodoistAPI
import pytz
import matplotlib.pyplot as plt
import seaborn as sns

myToken = '6de0c48443150c63ac197ac8fb141a00e471eb03'
api = TodoistAPI(myToken)

In [None]:
print(api.state['projects'][1:3])

## Create Project Table

In [None]:
def parseProjectData(sourceProjectData, idList, projectList, colorList, parentIdList):
    """Creates subset of relevant project data from raw API output.
    Args:
        sourceProjectData: Dictionary of project data pulled directly from ToDoist API.
        idList: List containing project id(s).
        projectList: List containing project names.
        colorList: List containing project colors.
        parentIdList: List containing parent id(s).

    Returns:
        idList: List containing project id(s).
        projectList: List containing project names.
        colorList: List containing project colors.
        parentIdList: List containing parent id(s).
    """
        
    for sourceProject in sourceProjectData:
        myId.append(sourceProject['id'])
        myProjects.append(sourceProject['name'])
        myColor.append(sourceProject['color'])
        myParentId.append(sourceProject['parent_id'])
    
    return idList, projectList, colorList, parentIdList

In [None]:
myId, myProjects, myColor, myParentId = [], [], [], [] 

currentProjectData = api.state['projects']
archivedProjectData = api.projects.get_archived()

myId, myProjects, myColor, myParentId = parseProjectData(currentProjectData, myId, myProjects, myColor, myParentId)
# myId, myProjects, myColor, myParentId = parseProjectData(archivedProjectData, myId, myProjects, myColor, myParentId)

In [None]:
projectTable = pd.DataFrame({'id' : myId,
                        'parent_id' : myParentId,
                        'color' : myColor }, index = myProjects, dtype = 'int64')

projectTable = projectTable[['id','color','parent_id']]
print (projectTable)

## Create Item Table

In [None]:
rawItemList = api.completed.get_all(project_id = projectTable.loc['Professional','id'], limit = 50, offset = 0, until = '2017-6-29T10:13', since = '2017-5-29T10:13')
len(rawItemList['items'])

In [None]:
def my_range(start, end, step):
    """Custom loop counter.
    Args:
        start: Initial value.
        end: Terminal value.
        step: Increment value.

    Returns:
        start: List of loop values.
    """
    while start < end:
        yield start
        start += step

In [None]:
len(api.activity.get(object_type = 'item', event_type = 'completed', until = '2017-12-31T23:13', since = '2017-1-1T00:01', limit = 100))

In [None]:
endDate = '2017-10-31T23:59'
startDate = '2017-08-01T00:01'

#endDateFmt = datetime.datetime.strptime(endDate, "%Y-%m-%dT%H:%M")
countDateFmt = datetime.datetime.strptime(endDate, "%Y-%m-%dT%H:%M")
startDateFmt = datetime.datetime.strptime(startDate, "%Y-%m-%dT%H:%M")

print (countDateFmt < startDateFmt)
print (countDateFmt - datetime.timedelta(days=7))

In [None]:
bufferSize = 50
bufferStart = 0

projectId, projectName, itemId, itemName, completedDate, label = [], [], [], [], [], []

#for bufferStart in range(0,1500,50):
while (countDateFmt > startDateFmt):
    
    itemBuffer = api.completed.get_all(limit = bufferSize, offset = bufferStart, until = endDate, since = startDate)
    #print(int(len(itemBuffer['items'])))
    bufferLength = int(len(itemBuffer['items']))
    
    for bufferIndex in my_range(0,bufferLength,1):
        projectId.append(itemBuffer['items'][bufferIndex]['project_id'])
        itemId.append(itemBuffer['items'][bufferIndex]['id'])
        itemName.append(itemBuffer['items'][bufferIndex]['content'])
        completedDate.append(itemBuffer['items'][bufferIndex]['completed_date'])
    
    bufferStart += bufferLength
    countDateFmt = countDateFmt - datetime.timedelta(days=7)

In [None]:
print (len(projectId))

In [None]:
itemTable = pd.DataFrame({'Project_Id' : projectId,
                          'Project' : None,
                          'Item_Id' : itemId,
                          'Content' : itemName,
                          'Date' : completedDate,
                          'Energy' : None,
                          'Time' : None})

itemTable = itemTable[['Project_Id', 'Project', 'Item_Id','Content', 'Date', 'Energy', 'Time']]

In [None]:
itemTable

In [None]:
print (itemTable.loc[0]["Date"])

In [None]:
%run -i 'data_download_projects.py'

In [None]:
%run -i 'data_download_items.py'

In [None]:
def localize_time(api_time, myTimeZone):
    """Convert timezone from UTC to local.
    Args:
        api_time: UTC time from ToDoist API.
        myTimeZone: Pytz object generated for local timezone.

    Returns:
        api_time: Modified for local timezone in datetime object format.
    """    
    api_time = datetime.datetime.strptime(api_time, "%a %d %b %Y %H:%M:%S %z")
    
    return api_time.astimezone(myTimeZone)

In [None]:
def parse_full_date(local_date):
    """Converts date from Todoist format to mm/dd/yyyy
    Args:
        local_date: Localized date and time in ToDoist format.

    Returns:
        local_date: Modified for mm/dd/yyyy format.
    """    
    return local_date.strftime('%m-%d-%Y')

In [None]:
def energy_level(content):
    """Categorizes task by energy label.
    Args:
        content: Complete task description from ToDoist.

    Returns:
        Appropriate energy label in string format.
    """
    if (content.find("@High-energy") != -1):
        return "High-energy"
    elif (content.find("@Normal-energy") != -1):
        return "Normal-energy"
    elif (content.find("@Low-energy") != -1):
        return "Low-energy"
    else:
        return None

In [None]:
def time_effort(content):
    """Categorizes task by time label.
    Args:
        content: Complete task description from ToDoist.

    Returns:
        Appropriate time label in int format.
    """
    if (content.find("@60-minutes") != -1 ):
        return 60
    elif (content.find("@30-minutes") != -1):
        return 30
    elif (content.find("@5-minutes") != -1):
        return 5
    else:
        return None

In [None]:
def project_link(project_id):
    """Links project id with project name
    Args:
        project_id: Identifier for project in int format.

    Returns:
        Project name in string format.
    """    
    return (str(projectTable.index[projectTable['id'] == project_id][0]))

In [None]:
# Convert completed time from UTC to EST
myTimeZone = pytz.timezone('US/Eastern')
itemTable["Date"] = itemTable["Date"].apply(localize_time, args=(myTimeZone,))

Strip local time to include date only
#itemTable["Date"] = itemTable["Date"].map(parse_full_date)

# Categorize energy labels
itemTable["Energy"] = itemTable["Content"].map(energy_level)

# Categorize time labels
itemTable["Time"] = itemTable["Content"].map(time_effort)

# Match project id with project name
itemTable["Project"] = itemTable["Project_Id"].apply(project_link)

In [None]:
itemTable.iloc[0:30]

# Step 3: Data Analysis

## Number of tasks completed each week

I am quite interested in investigating how many tasks I completed each week for the past year. In order to do this, I'll group the number of items completed each week (in the Item Table) and plot that information.

In [None]:
# Produces a Pandas groupby object with date, project as keys, and indices of relevant columns
gDateProject_all = itemTable.groupby([pd.Grouper(freq='1W', key='Date'),'Project'], as_index=True)

# Gives list of dictionary keys and indices of relevant columns
gDateProject_all.groups

# Gives count of items in grouped list (by dictionary key)
gDateProject_all.count()

# Gets slice of relevant column from grouped dataframe
gDateProject_content = gDateProject_all['Content']

# Gives count of items from sliced column
gDateProject_content = gDateProject_content.count()
#gDateProject_content

In [None]:
# Groups already grouped dataframe by parent index (level=0) AND sums it up
gDate_content_count = gDateProject_content.groupby(level=0)
gDate_content_count = gDate_content_count.sum()
#gDate_content_count

In [None]:
# Raw table showing all the tasks completed each week this year
gDate_vProject = gDateProject_content.unstack(level=1, fill_value=0)
gDate_vProject

In [None]:
%pylab inline

stackTaskCompleted = gDate_vProject.plot.bar(stacked=True, figsize=(15, 15))
stackTaskCompleted.set_ylabel("Tasks Completed")
stackTaskCompleted.set_title("Weekly Task Completion")

It's interesting to note some of the peaks and valley's in the graph above (each x-axis label represents the date on which a week ends): 
- I was puzzled by low task completion numbers at the end of March and early April. Upon further reflection, I realized that during this time period I was in Europe with my parents (on vacation).
- There were a few weeks at the end of July and during August which were extremely fruitful. Unsurprising, since that month involved a sprint to complete my Intro to Programming Nanodegree, moving apartments, important work deadlines, and preparation for my trip to Japan.

This information is quite interesting, but I know that not all tasks are created equal. Completing a dizzying array of low-energy tasks (taking out the trash, buying tickets to a show, cancelling my Comcast subscription, etc) doesn't really help me achieve big, hairy goals.

# Relationship between energy level and overall task completion

In order to get a more accurate picture of productivity, I want to compare the number of high-energy tasks (which require deep work) agains the total number of tasks completed in each week. 

Below, I first graph the overall breakdown of tasks by energy type. Second, I compare the number of total tasks completed with the number of high energy tasks completed each week.

In [None]:
# Plot of tasks completed each week by energy level

gDateEnergy = itemTable.groupby([pd.Grouper(freq='1W', key='Date'),'Energy'], as_index=True)
gDateEnergy_content = gDateEnergy['Content']

# cust = itemTable.groupby(['Energy'], as_index=True)
# cust.get_group('Normal-energy')

In [None]:
gDateEnergy_plot = gDateEnergy_content.count().unstack().plot.bar(stacked=True, figsize=(10, 10))
gDateEnergy_plot.set_title('Task Completion (Sorted by Energy Level)')
gDateEnergy_plot.set_ylabel("Tasks Completed")

In [None]:
# gDateEnergy_content.count()
gDate_vEnergy = gDateEnergy_content.count().unstack()
gDate_vEnergy = gDate_vEnergy.fillna(0)

In [None]:
gDate_vEnergy['Total'] = gDate_vEnergy.sum(axis=1)
gDate_vEnergy

In [None]:
highEnergy_vTotal = gDate_vEnergy.plot(x='Total', y='High-energy', style='o', figsize=(10, 10))
highEnergy_vTotal.set_xlim([0,53])
highEnergy_vTotal.set_ylim([-1,12])
highEnergy_vTotal.set_title('Relationship b/w high energy and total task count')
highEnergy_vTotal.set_xlabel('# Of Total Tasks')
highEnergy_vTotal.set_ylabel('# Of High-Energy Tasks')

gDate_vEnergy_x = gDate_vEnergy['Total']
gDate_vEnergy_y = gDate_vEnergy['High-energy']

gDate_vEnergy_fit = np.polyfit(gDate_vEnergy_x, gDate_vEnergy_y, deg=1)
highEnergy_vTotal.plot(gDate_vEnergy_x, gDate_vEnergy_fit[0] * gDate_vEnergy_x + gDate_vEnergy_fit[1], color='red')

highEnergy_vTotal_corr = gDate_vEnergy['High-energy'].corr(gDate_vEnergy['Total'])
print ('Correlation: ' + str(round(highEnergy_vTotal_corr,2)))

# gDateEnergy_content.count().groupby(level=1).sum()

As shown by the plot, there is a positive correlation between the number of high-energy tasks and the total number of tasks completed each week. This result was quite surprising to me. My intuition suggested that I got a fewer total number of tasks completed on the weeks that I got a lot of high-energy tasks done (because they tend to take a longer amount of time). But as it turns out, getting high-energy tasks motivates me to get more done overall.

## Miscellaneous

One of my side responsibilites at work is to coordinate the receipt and shipment of prototype parts to other locations. I was curious to see how many shipments I had made (by filtering tasks by the word "ship").

In [None]:
ship_hash = itemTable['Content'].str.contains("ship")
print ("Total Shipments : " + str(ship_hash.sum()))

Wow! That's a lot of shipments. I'm going to discuss this with my manager to see whether we can hire someone to make shipments for us.

Next, I was interested to investigate how many meetings I had attended or hosted.

In [None]:
itemTable['Content'] = itemTable['Content'].str.lower()
meeting_hash = itemTable['Content'].str.contains("meeting")
print ("Total Meeting Count : " + str(meeting_hash.sum()))

In [None]:
totalMeetingTime = ((itemTable['Time'] * meeting_hash).sum()) / 60
print ("Total Meeting Time : " + str(round(totalMeetingTime)) + " " + "Hours")

Overall, I spent over a month in meetings (assuming 40hr work weeks) this year! Because of the nature of my work, it is unlikely that I will ever be able to get this close to zero. However, I wouldn't be surprised if I could reduce meeting time in half; I should also have a discussion with my manager on how to do this.

# Conclusion

This project was quite helpful in understanding some basic facts about my productivity history:

- Overall, I learned that the end of June and early August were my most productive months (when I had quite a few parallel projects due)
- There is a positive correlation between the number of high-level energy tasks completed and total tasks. Getting the hard tasks accomplished may motivate me to complete more of the easier tasks.
- I've spent a significant portion of my time (over 150+ hours) in meetings and making shipments. I need to have a discussion about these two topics with manager to see how we can reduce the amount of time spent on them.

Completing this analyis only leads to more questions, however. In a future revision to this project, I would be curious to explore the following:
- How did the number of meetings in a week affect the total number of high-energy tasks completed?
- Why exactly was I able to complete more high-energy tasks in certain weeks as opposed to others? Could the number of my social engagements with friends have had an impact?
- Was I able to complete high-energy tasks during the week or on the weekend?
