# Obsidian Tools

A Jupyter Notebook for accessing and analysing Obsidian vaults with Pandas and NetworkX.

#### Import packages

In [None]:
# Obsidiantools Requirements
import numpy as np
import pandas as pd
import networkx as nx
import obsidiantools.api as otools

import os
from pathlib import Path

# Matplotlib for visualisation
import matplotlib.pyplot as plt
%matplotlib inline

## Set vault directory

In [None]:
# Request path from user
vault_dir = Path(os.path.abspath(input('Vault directory: ')))  # Note: Currently this requires a linux style path
# Confirm path exists
os.path.exists(vault_dir)

## Connect and Gather the Vault Contents

The **Vault** object has two methods which must be called in order to prepare the Obsidian vault for analysis:

- **.connect()** - This method connects the vault contents into a graph. It provides access to vault metadata and enables you to lookup your notes and get references to links.
- **.gather()** - This method gathers the vault notes' content. It provides a master index of notes and provides access to plaintext content of individual notes.

In [None]:
# Prepare vault for analysis
vault = otools.Vault(vault_dir).connect().gather()
# Confirm vualt path
print(vault.dirpath)
# Confirm vualt readiness
print(f"Connected? - {vault.is_connected}")
print(f"Gathered?  - {vault.is_gathered}")

## Access Vault Contents

#### List files in vault

In [None]:
vault.file_index

#### Filter list by subdirectory

In [None]:
(otools.Vault(vault_dir, include_subdirs=['docs/Concepts'], include_root=False)
 .file_index) # Note: Currently this requires a linux style path

#### Identify Isolated Notes

In Obsidian isolated or 'orphan' notes are ones with **no backlinks** and **no wikilinks** which are the preferred method for establishing internal links `[[TITLE OF LINKED NOTE]]` between notes.

In [None]:
vault.isolated_notes

#### Identify Non-existent Notes

In Obsidian it is possible to create links to notes that don't exist yet. These non-existent notes appear in the vault graph and **have backlinks** but do not exist as markdown files.

**NOTE:** Obsidian can erroneously read values in nested arrays from inline code blocks as internal wikilinks that create unexpected backlinks.

In [None]:
vault.nonexistent_notes

## Analyse Vault

#### Create Dataframe of Vault Metadata

In [None]:
df = vault.get_note_metadata()
df.head()

#### Summarise Vault Metadata

In [None]:
df.info()

### Analyse Backlinks

#### Sort Notes by Number of Backlinks

In [None]:
df.sort_values('n_backlinks', ascending=False)

#### Get List of Backlinks For Specific Notes

In [None]:
vault.get_backlinks('Obsidian')

#### Get List of Backlinks with Counts

In [None]:
vault.get_backlink_counts('Obsidian')

#### View Backlinks Index

In [None]:
vault.backlinks_index

### Analyse Wikilinks

In [None]:
df.sort_values('n_wikilinks', ascending=False)

#### Get List of Wikilinks For Specific Notes

In [None]:
vault.get_wikilinks('Obsidian')

#### Get List of Wikilinks with Counts

In [None]:
#vault.get_wikilink_counts('Obsidian') 
#This functionality is not yet implemented

#### View Wikilinks Index

In [None]:
vault.wikilinks_index

### Analyse Embedded Files

In [None]:
df.sort_values('n_embedded_files', ascending=False)

#### Get List of Embedded For Specific Notes

In [None]:
vault.get_embedded_files('Obsidian')

#### View Embedded Files Index

In [None]:
vault.embedded_files_index

### Analyse Front Matter

In [None]:
vault.get_front_matter('Obsidian')

### Analyse Tags

In [None]:
vault.get_tags('Obsidian')

#### View Tags Index

In [None]:
vault.tags_index

## Visualise The Vault with NetworkX

#### Map Node Colours to Existence / Non-Existence of Notes

In [None]:
color_cat_map = {False: '#D3D3D3', True: '#826ED9'}
color_vals = (df['note_exists']
              .map(color_cat_map)
              .values)

#### Plot Network

In [None]:
fig, ax = plt.subplots(figsize=(12,10))
nx.draw(vault.graph, node_color=color_vals, with_labels=True, ax=ax, pos=nx.fruchterman_reingold_layout(vault.graph))
ax.set_title('Vault graph')
plt.show()

## Graph Analysis

#### Get Node Centrality Using Pagerank

Pagerank assumes that the highest ranked notes are those likely to receive more links from other notes: the notes that have backlinks from the broadest range of other notes. 

The Pagerank score considers not only the number of backlinks but also their quality: Notes do not rank high if they contain multiple backlinks from the same note.

In [None]:
(pd.Series(nx.pagerank(vault.graph), name='pagerank')
 .sort_values(ascending=False))

#### Check Quality of Backlinks By Reviewing Note Text

In [None]:
note_text = (vault.get_text('Obsidian'))
print(note_text)