### Extract Data from kindle's MyClippings.txt
---
- [Generate ipynb files](#Generate-IPYNB-Files)
- [Generate markdown string](#Generate-Markdown-String)
- [View in Tabular format](#Tabular-HTML-Format)

In [1]:
import json
from IPython.display import HTML, display
from tabulate import tabulate

empty_contents = 'empty_notebook.json'
my_clippings = "My Clippings - Kindle.txt" # path to your clippings

raw_file = open(my_clippings, "r")

In [2]:
def parse_empty_notebook():
    """
    parse the json in order to write the ipynb
    """
    with open(empty_contents) as json_file:
        data = json.load(json_file)
    return data

In [3]:
def prepare_dict_data():
    """
    preapres the raw data from file content to dict for easy manipulation.
    """
    block_dict = {}  # our manipulated dict of books with its highlights / notes
    block_container = []  # append lines of the blocks
    for x in raw_file:
        block_container.append(x)
        if x.startswith('==='):
            book_name = block_container.pop(0)
            if book_name not in block_dict.keys():
                block_dict[book_name] = [block_container]
            else:
                existing_content = block_dict[book_name]
                existing_content.append(block_container)
            block_container = []
    return block_dict

book_dict_data = prepare_dict_data()

In [4]:
def write_ipynb(content, book_name):
    """
    writes ipynb file
    """
    json_content = json.dumps(content, ensure_ascii=False, indent=2)
    f = open(f"book_extracts/{book_name}.ipynb", "w+")
    f.write(json_content)
    f.close()

### Generate IPYNB Files
---

In [5]:
for book_name, val in book_dict_data.items():
    """
    loops over manipulated data to achieve the goal
    """
    file_content = parse_empty_notebook()
    source = file_content["cells"][0]["source"]
    heading = "# " + book_name
    source.extend([heading, '\n'])
    for a in val:
        highlighted_text = '> ' + a[2] + '\n\n'
        source.extend([highlighted_text, '\n', '\n'])
        timestamp = a[0].strip('- ')
        a = timestamp.split('|')
        page_no = a[0].split(' ')[-2]
        highlight_time = a[-1]
        source.append(f"<sup>Page: {page_no} - {highlight_time} </sup>")  # timestamp
        source.extend(['\n', '\n', '---', '\n', '\n'])
    write_ipynb(file_content, book_name)

### Generate Markdown String
---

In [6]:
block_container = []  # append lines of the blocks
for x in raw_file:
    block_container.append(x)
    if x.startswith('==='):
        book_name = block_container.pop(0)
        if book_name not in block_dict.keys():
            block_dict[book_name] = [block_container]
        else:
            existing_content = block_dict[book_name]
            existing_content.append(block_container)
        block_container = []
raw_file.close()
markdown = ""
for key, val in book_dict_data.items():
    heading = "#### " + key
    markdown += heading
    markdown += '\n\n'
    for a in val:
        markdown += '> ' + a[2] + '\n\n'
        markdown += a[0]
        markdown += '\n\n\n'

### Code cell to view markdown
---

In [7]:
from IPython.display import Markdown as md
md(markdown)

#### Deep Work (Cal Newport)


> Twain’s study was so isolated from the main house that his family took to blowing a horn to attract his attention for meals.


- Your Highlight on Location 45-46 | Added on Friday, November 8, 2019 9:38:50 AM



#### How to Win Friends and Influence People (Dale Carnegie)


> Filled with energy


- Your Note on page 7 | Location 93 | Added on Monday, April 27, 2020 3:48:21 PM



> Don’t criticize, condemn or complain.


- Your Highlight on page 32 | Location 479-479 | Added on Tuesday, December 10, 2019 2:30:36 PM



#### Clean Agile: Back to Basics (Robert C. Martin)


> The most honest estimate is “I don’t know.”


- Your Highlight on page 104 | Location 1587-1588 | Added on Monday, March 16, 2020 10:00:36 PM



> And besides, humans are not machines. Asking humans to do what machines can do is expensive, inefficient, and immoral.


- Your Highlight on page 102 | Location 1562-1563 | Added on Monday, March 16, 2020 9:31:36 PM





### Tabular HTML Format
---

In [8]:
last_line = ''
block_list = []
block_dict = {}

block_container = []  # append lines of the blocks
raw_file = open(my_clippings, "r")
for x in raw_file:
    block_container.append(x)
    if x.startswith('==='):
        block_container.pop()
        block_list.append(block_container)
        block_container = []
headers = ['Book', 'Time', 'Empty Col', 'Content']
display(HTML(tabulate(block_list, headers, tablefmt="html", stralign="left")))
raw_file.close()

Book,Time,Empty Col,Content
Deep Work (Cal Newport),"- Your Highlight on Location 45-46 | Added on Friday, November 8, 2019 9:38:50 AM",,Twain’s study was so isolated from the main house that his family took to blowing a horn to attract his attention for meals.
How to Win Friends and Influence People (Dale Carnegie),"- Your Note on page 7 | Location 93 | Added on Monday, April 27, 2020 3:48:21 PM",,Filled with energy
Clean Agile: Back to Basics (Robert C. Martin),"- Your Highlight on page 104 | Location 1587-1588 | Added on Monday, March 16, 2020 10:00:36 PM",,The most honest estimate is “I don’t know.”
Clean Agile: Back to Basics (Robert C. Martin),"- Your Highlight on page 102 | Location 1562-1563 | Added on Monday, March 16, 2020 9:31:36 PM",,"And besides, humans are not machines. Asking humans to do what machines can do is expensive, inefficient, and immoral."
How to Win Friends and Influence People (Dale Carnegie),"- Your Highlight on page 32 | Location 479-479 | Added on Tuesday, December 10, 2019 2:30:36 PM",,"Don’t criticize, condemn or complain."
