# RescueTime Downloader

Code to collect and export RescueTime Activity Logs, includes options to collect in hourly or minute bins. Default is hourly.

**NOTE:** Collecting Full History takes some time, depending how many years of data you have. I recommend you configure the script below to pull data in yearly chunks, though it should work if you attempt to export full history. 

------

## Setup and Installation

* Go to [RescueTime API](https://www.rescuetime.com/anapi/manage) and copy an API Key 
* Copy credentials-sample.json to create credentials.json and add your RescueTime Key.
* This project depends on no additional code besides standard python libraries and Pandas. 

-----

## Dependencies

In [80]:
import requests
import os
from datetime import date, datetime, timedelta as td
import pandas as pd

----

## Credentials

In [81]:
import json

with open("credentials.json", "r") as file:
    credentials = json.load(file)
    rescuetime_cr = credentials['rescuetime']
    KEY = rescuetime_cr['KEY']

In [82]:
baseurl = 'https://www.rescuetime.com/anapi/data?key='

In [83]:
url =  baseurl + KEY

----

## Export Dates Configuration

In [87]:
# Configure These to Your Preferred Dates
start_date = '2019-11-04'  # Start date for data
end_date   = '2019-11-07'  # End date for data

------

## Function to Get RescueTime Activities

In [85]:
# Adjustable by Time Period
def rescuetime_get_activities(start_date, end_date, resolution='hour'):
    # Configuration for Query
    # SEE: https://www.rescuetime.com/apidoc
    payload = {
        'perspective':'interval',
        'resolution_time': resolution, #1 of "month", "week", "day", "hour", "minute"
        'restrict_kind':'document',
        'restrict_begin': start_date,
        'restrict_end': end_date,
        'format':'json' #csv
    }
    
    # Setup Iteration - by Day
    d1 = datetime.strptime(payload['restrict_begin'], "%Y-%m-%d").date()
    d2 = datetime.strptime(payload['restrict_end'], "%Y-%m-%d").date()
    delta = d2 - d1
    
    activities_list = []
    
    # Iterate through the days, making a request per day
    for i in range(delta.days + 1):
        # Find iter date and set begin and end values to this to extract at once.
        d3 = d1 + td(days=i) # Add a day
        if d3.day == 1: print('Pulling Monthly Data for ', d3)

        # Update the Payload
        payload['restrict_begin'] = str(d3) # Set payload days to current
        payload['restrict_end'] = str(d3)   # Set payload days to current

        # Request
        try: 
            r = requests.get(url, payload) # Make Request
            iter_result = r.json() # Parse result
            # print("Collecting Activities for " + str(d3))
        except: 
            print("Error collecting data for " + str(d3))
    
        for i in iter_result['rows']:
            activities_list.append(i)
            
    return activities_list

---

## Collect Report of Activites By Day

In [86]:
activities_day_log = rescuetime_get_activities(start_date, end_date, 'day')

Error collecting data for 2019-11-04


UnboundLocalError: local variable 'iter_result' referenced before assignment

In [68]:
activities_daily = pd.DataFrame.from_dict(activities_day_log)

In [69]:
activities_daily.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 246 entries, 0 to 245
Data columns (total 7 columns):
0    246 non-null object
1    246 non-null int64
2    246 non-null int64
3    246 non-null object
4    246 non-null object
5    246 non-null object
6    246 non-null int64
dtypes: int64(3), object(4)
memory usage: 13.5+ KB


In [63]:
activities_daily.describe()

Unnamed: 0,1,2,6
count,246.0,246.0,246.0
mean,103.597561,1.0,0.934959
std,295.260336,0.0,1.100945
min,1.0,1.0,-2.0
25%,3.0,1.0,0.0
50%,9.5,1.0,1.0
75%,41.75,1.0,2.0
max,2288.0,1.0,2.0


In [64]:
activities_daily.tail()

Unnamed: 0,0,1,2,3,4,5,6
241,2019-11-06T00:00:00,1,1,Windows Explorer,2. Homework,General Utilities,1
242,2019-11-06T00:00:00,1,1,rescuetime.com,No Details,Intelligence,2
243,2019-11-06T00:00:00,1,1,accounts.google.com,https://accounts.google.com/signin/oauth/conse...,General Communication & Scheduling,0
244,2019-11-06T00:00:00,1,1,Windows Explorer,100% complete,General Utilities,1
245,2019-11-06T00:00:00,1,1,linkedin.com,No Details,Professional Networking,1


----

## Collect Report of Activites By Hour

In [48]:
activities_hour_log = rescuetime_get_activities(start_date, end_date, 'hour')

In [49]:
activities_hourly = pd.DataFrame.from_dict(activities_hour_log)

In [50]:
activities_hourly.columns = ['Date', 'Seconds', 'NumberPeople', 'Actitivity', 'Document', 'Category', 'Productivity']

In [51]:
activities_hourly.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 361 entries, 0 to 360
Data columns (total 7 columns):
Date            361 non-null object
Seconds         361 non-null int64
NumberPeople    361 non-null int64
Actitivity      361 non-null object
Document        361 non-null object
Category        361 non-null object
Productivity    361 non-null int64
dtypes: int64(3), object(4)
memory usage: 19.8+ KB


In [52]:
activities_hourly.describe()

Unnamed: 0,Seconds,NumberPeople,Productivity
count,361.0,361.0,361.0
mean,70.595568,1.0,0.975069
std,209.840793,0.0,1.088801
min,1.0,1.0,-2.0
25%,3.0,1.0,0.0
50%,8.0,1.0,1.0
75%,34.0,1.0,2.0
max,2075.0,1.0,2.0


In [53]:
activities_hourly.tail()

Unnamed: 0,Date,Seconds,NumberPeople,Actitivity,Document,Category,Productivity
356,2019-11-06T13:00:00,2,1,rescuetime.com,Sem título - Google Chrome,Intelligence,2
357,2019-11-06T13:00:00,1,1,trello.com,Prifina Product Prototyping | Trello,Project Management,2
358,2019-11-06T13:00:00,1,1,rescuetime.com,No Details,Intelligence,2
359,2019-11-06T13:00:00,1,1,github.com,No Details,General Software Development,2
360,2019-11-06T13:00:00,1,1,localhost:8888,rescuetime/,General Software Development,2


In [54]:
activities_hourly.to_csv('data/rescuetime-hourly-' + start_date + '-to-' + end_date + '.csv')

## Collect Report of Activites By Minute

In [20]:
activities_minute_log = rescuetime_get_activities(start_date, end_date, 'minute')

In [21]:
activities_per_minute = pd.DataFrame.from_dict(activities_minute_log)

In [22]:
# Date', u'Time Spent (seconds)', u'Number of People', u'Activity', u'Document', u'Category', u'Productivity'
activities_per_minute.columns = ['Date', 'Seconds', 'NumberPeople', 'Actitivity', 'Document', 'Category', 'Productivity']

In [23]:
activities_per_minute.head()

Unnamed: 0,Date,Seconds,NumberPeople,Actitivity,Document,Category,Productivity
0,2019-11-05T14:55:00,29,1,rescuetime.com,RescueTime - Setting up your RescueTime applic...,Intelligence,2
1,2019-11-05T14:55:00,5,1,RescueTime,RescueTime,Intelligence,2
2,2019-11-05T14:55:00,4,1,rescuetime.com,RescueTime - Setting up your RescueTime applic...,Intelligence,2
3,2019-11-05T15:00:00,158,1,Google Documents,Time Management Research - Quantified Self - G...,Writing,2
4,2019-11-05T15:00:00,158,1,Google Documents,Time Management Research - Quantified Self - G...,Writing,2


In [24]:
activities_per_minute.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 611 entries, 0 to 610
Data columns (total 7 columns):
Date            611 non-null object
Seconds         611 non-null int64
NumberPeople    611 non-null int64
Actitivity      611 non-null object
Document        611 non-null object
Category        611 non-null object
Productivity    611 non-null int64
dtypes: int64(3), object(4)
memory usage: 33.5+ KB


In [25]:
activities_per_minute.describe()

Unnamed: 0,Seconds,NumberPeople,Productivity
count,611.0,611.0,611.0
mean,41.710311,1.0,0.991817
std,77.145314,0.0,1.087152
min,1.0,1.0,-2.0
25%,2.0,1.0,0.0
50%,7.0,1.0,1.0
75%,40.0,1.0,2.0
max,515.0,1.0,2.0


In [26]:
activities_per_minute.to_csv('data/rescuetime-by-minute' + start_date + '-to-' + end_date + '.csv')

-----

## Simple Analysis (Using Exported Logs)

In [27]:
import glob
import os

In [28]:
# import hourly data exports and create a single data frame
path = 'data/'
allFiles = glob.glob(path + "/rescuetime-hourly*.csv")
timelogs = pd.DataFrame()
list_ = []
for file_ in allFiles:
    df = pd.read_csv(file_,index_col=None, header=0)
    list_.append(df)
activities = pd.concat(list_)

In [29]:
len(activities) # 312477

361

In [30]:
# total hours
activities.Seconds.sum() / 60 / 60

7.079166666666667

In [31]:
# total days
activities.Seconds.sum() / 60 / 60 / 24

0.29496527777777776

In [32]:
activities.head()

Unnamed: 0.1,Unnamed: 0,Date,Seconds,NumberPeople,Actitivity,Document,Category,Productivity
0,0,2019-11-05T14:00:00,29,1,rescuetime.com,RescueTime - Setting up your RescueTime applic...,Intelligence,2
1,1,2019-11-05T14:00:00,5,1,RescueTime,RescueTime,Intelligence,2
2,2,2019-11-05T14:00:00,4,1,rescuetime.com,RescueTime - Setting up your RescueTime applic...,Intelligence,2
3,3,2019-11-05T15:00:00,2075,1,markwk.com,No Details,Uncategorized,0
4,4,2019-11-05T15:00:00,700,1,trello.com,Time Management Dashboard Prototype on Prifina...,Project Management,2


In [33]:
activities.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 361 entries, 0 to 360
Data columns (total 8 columns):
Unnamed: 0      361 non-null int64
Date            361 non-null object
Seconds         361 non-null int64
NumberPeople    361 non-null int64
Actitivity      361 non-null object
Document        361 non-null object
Category        361 non-null object
Productivity    361 non-null int64
dtypes: int64(4), object(4)
memory usage: 22.6+ KB


In [70]:
activities.describe()

Unnamed: 0.1,Unnamed: 0,Seconds,NumberPeople,Productivity
count,361.0,361.0,361.0,361.0
mean,180.0,70.595568,1.0,0.975069
std,104.355961,209.840793,0.0,1.088801
min,0.0,1.0,1.0,-2.0
25%,90.0,3.0,1.0,0.0
50%,180.0,8.0,1.0,1.0
75%,270.0,34.0,1.0,2.0
max,360.0,2075.0,1.0,2.0


In [73]:
activities.to_csv('data/rescuetime-full-data-export.csv',index=False)

In [323]:
# create columns for year, month, day, and dow

In [36]:
# pivot table 
# activities.pivot(index='date', columns='Category', values='seconds')
# temp.pivot(columns='Category', values='Seconds')