# Pocket Article Downloader

----

Want to know how many articles you've read and have to read in Pocket? 

This code provides setup and authorization to Pocket's API. We then pull your read and unread articles and export them to CSV. 

For more info and additional configuration, See [Pocket API Documentation](https://getpocket.com/developer/docs/overview).

-----

# Authentication and Pocket Developer Setup

Note: This setup may take a few minutes. Code is indebted to [What’s in your Pocket? Visualizing your Reading List with Python](https://www.twilio.com/blog/2017/09/whats-in-your-pocket-visualizing-your-reading-list-with-python.html). If any issues, refer to that article for screenshots and more details on setup. 

### Step 1: Intial Developer Setup 

* Create an app on Pocket's Developer API Portal: https://getpocket.com/developer/apps/new
* Ensure you add retrieval permission
* Copy your Consumer Key and add to either option 1 or 2
* Option 1 (Easiest but less secure): Copy keys and store in notebook   
* Option 2 (More Secure since not stored in notebook): Copy sample-credentials.json, create credentials and add keys

In [18]:
# Option 1 (Easiest but less secure):  
# Copy your keys here after each step

# CONSUMERKEY = 'add code here'
# REQUESTCODE = 'add code here'
# ACCESSTOKEN = 'add code here'

In [19]:
# Option 2 (More Secure since not stored in notebook): 
# Copy sample-credentials.json, create credentials 

# Uncomment lines below and add code after each step
# Copy your keys here after each step

# import json

# with open("credentials.json", "r") as file:
#    credentials = json.load(file)
#    pocket_cr = credentials['pocket']
#    CONSUMERKEY = pocket_cr['CONSUMERKEY'] # step 1 your consumer key
#    REQUESTCODE = pocket_cr['REQUESTCODE'] # step 2 your request token
#    ACCESSTOKEN = pocket_cr['ACCESSTOKEN'] # step 4 your access token

In [20]:
# Step 2

# Uncomment lines below and add code after each step
# Copy and update your keys after each step

# import requests
# pocket_api = requests.post('https://getpocket.com/v3/oauth/request', 
#                           data = {'consumer_key': CONSUMERKEY, 
#                                   'redirect_uri':'https://google.com'})

# uncomment line below to see your request code
# pocket_api.text

In [21]:
# Step 3: 

# After modify URL to add your code and visit: 
# Visit: https://getpocket.com/auth/authorize?request_token=[Your-Request-Code]&redirect_uri=https://google.com

# Copy your request code

In [22]:
# Step 4:

# Uncomment lines below and add code after each step
# Copy and update your keys after each step

#import requests
#pocket_auth = requests.post('https://getpocket.com/v3/oauth/authorize', 
#                            data = {'consumer_key': CONSUMERKEY, 
#                                    'code': REQUESTCODE})

# uncomment line below to see your access token code
# pocket_auth.text

------

# Get and Export Current, Unread Articles

In [75]:
from pocket import Pocket, PocketException
import json
import numpy as np
import pandas as pd

In [3]:
# If first time running script, please read "Authentication and Pocket Developer Setup" 
# and follow steps above to update your keys and tokens

with open("credentials.json", "r") as file:
   credentials = json.load(file)
   pocket_cr = credentials['pocket']
   CONSUMERKEY = pocket_cr['CONSUMERKEY'] # step 1 your consumer key
   REQUESTCODE = pocket_cr['REQUESTCODE'] # step 2 your request token
   ACCESSTOKEN = pocket_cr['ACCESSTOKEN'] # step 4 your access token

In [4]:
# Setup Pocket Object
p = Pocket(
 consumer_key=CONSUMERKEY,
 access_token=ACCESSTOKEN
)

In [111]:
# Retrieve all unread
articles_dict = {}
more_articles = True
offset = 0

# Get initial 5000 articles 
lis = p.get(state="unread", count=5000)
articles_dict.update(lis[0]['list'])

unread_articles = pd.DataFrame.from_dict(articles_dict, orient='index')

In [112]:
# convert unix time to datetime
unread_articles['time_added'] = pd.to_datetime(unread_articles['time_added'], unit='s')
unread_articles['time_updated'] = pd.to_datetime(unread_articles['time_updated'], unit='s')

# replace zeros with nan
unread_articles.loc[unread_articles['time_favorited'] == '0','time_favorited'] = np.nan
unread_articles['time_favorited'] = pd.to_datetime(unread_articles['time_favorited'], unit='s')

In [114]:
# unread count
len(unread_articles)
# unread_articles.head()

477

In [115]:
# export to csv
unread_articles.to_csv('data/pocket_unread_articles.csv', index=False)

----

# Get and Export Read Articles

In [116]:
# Get Your Oldest Article in Pocket

oldest_date = ''
oldest_art = p.get(state="archive", count=1, sort='oldest')
oldest_article = oldest_art[0]['list']
for i in oldest_article:
    oldest_date = oldest_article[str(i)]['time_added']

# print(oldest_date) 
# oldest_article

In [117]:
# Retrieve all readings since oldest date

articles_dict = {}
more_articles = True
offset = 0

# Get initial 1000 articles 
print("Getting first 1000 read articles in Pocket...")
lis = p.get(since=oldest_date, state="archive", count=1000, sort='oldest')
articles_dict.update(lis[0]['list'])

# Let's Loop through additional pocket articles
while more_articles == True:
    if lis[0]['list'] == []:
        print("Completed. No More Read Articles to pull.")
        more_articles = False
        break
    else:
        offset = offset + 1000
        print("Getting an additional 1000 read articles...")
        lis = p.get(since=oldest_date, state="archive", count=1000, sort='oldest', offset=offset)
        articles_dict.update(lis[0]['list'])

# create dataframe
read_articles = pd.DataFrame.from_dict(articles_dict, orient='index')

Getting first 1000 read articles in Pocket...
Getting an additional 1000 read articles...
Getting an additional 1000 read articles...
Getting an additional 1000 read articles...
Getting an additional 1000 read articles...
Getting an additional 1000 read articles...
Getting an additional 1000 read articles...
Getting an additional 1000 read articles...
Completed. No More Read Articles to pull.


In [43]:
# read_articles.columns
# read_articles.info()
# read_articles.head()

In [118]:
# convert unix time to datetime
read_articles['time_added'] = pd.to_datetime(read_articles['time_added'], unit='s')
read_articles['time_updated'] = pd.to_datetime(read_articles['time_updated'], unit='s')

# replace zeros with nan
read_articles.loc[read_articles['time_favorited'] == '0','time_favorited'] = np.nan
read_articles['time_favorited'] = pd.to_datetime(read_articles['time_favorited'], unit='s')
read_articles['time_read'] = pd.to_datetime(read_articles['time_read'], unit='s')

In [119]:
# total  articles read
len(read_articles)

6080

In [120]:
# save to csv
read_articles.to_csv("data/pocket_read_articles.csv", index=False, encoding='utf-8')
print("Read Articles Exported to CSV")

Read Articles Exported to CSV
