<a href="https://colab.research.google.com/github/usc-isi-i2/kgtk-aaai2023/blob/main/04-MoralityInEvents.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Welcome to the KGTK-Browser tutorial!

In this section we will take a look at how we can install and run kgtk-browser from our github repostiry. We use kgtk-browser to explore the knowledge graph in a visual and more user-oriented way.

More specifically, we will focus on morality in events, illustrating how the kgtk-browser can be used to access and visualize custom data using the same format as wikidata.


### Step 1

First we will need to install the dependencies for tge browser.


### Step 2

In Step 2, we will use some of the existing kgtk-notebooks to help us split the data into different files containing claims, labels, aliases, etc. We will also calculate the pagerank of the nodes, this will be used later in the browser for search.


### Step 3

Finally, will use yet another kgtk-notebook to create the graph cache db in. This is where we will build and launch the user-facing part of the browser. Using a simple Flask server on the backend and React.js on the frontend makes this browser very easy to taylor to specific needs of the end-user.



In [1]:
#@title Check memory


from psutil import virtual_memory
ram_gb = virtual_memory().total / 1e9
print('\n\nYour runtime has {:.1f} gigabytes of available RAM\n\n'.format(ram_gb))



Your runtime has 27.3 gigabytes of available RAM




# Step 1: Install dependencies

In [1]:
#@title Install condacolab

%%time
%%capture


!pip install condacolab


import condacolab
condacolab.install()

CPU times: user 21.3 ms, sys: 13.4 ms, total: 34.7 ms
Wall time: 1.73 s


In [2]:
#@title Check condalab was installed

import condacolab
condacolab.check()

‚ú®üç∞‚ú® Everything looks OK!


In [3]:
#@title Clone the repositories
# - KGTK Browser
# - KGTK Notebooks

%%time
%%capture


# Get the kgtk-notebook repository as well
!git clone https://github.com/usc-isi-i2/kgtk-notebooks


# Get the latest kgtk-browser from GitHub
!git clone https://github.com/usc-isi-i2/kgtk-browser


# Install the requirements (i.e. kgtk)
!pip install -r kgtk-browser/requirements.txt

CPU times: user 2.07 s, sys: 395 ms, total: 2.46 s
Wall time: 3min 52s


In [4]:
#@title Create a new conda environment

%%time
%%capture


!unset $PYTHONPATH
!conda create --name kgtk-env python=3.8.15 --yes
!conda env update --name kgtk-env --file /content/kgtk-browser/environment.yml

CPU times: user 3.03 s, sys: 446 ms, total: 3.48 s
Wall time: 5min 58s


In [5]:
#@title Register environment as a kernel
# this is used later on when we run other notebooks
# we will be able to pass that environment on to them

!ipython kernel install --name kgtk-env --user

Installed kernelspec kgtk-env in /root/.local/share/jupyter/kernels/kgtk-env


In [9]:
!conda list -n kgtk-env | grep kgtk
!conda list -n kgtk-env | grep sqlite

# packages in environment at /usr/local/envs/kgtk-env:
kgtk                      1.5.2                    pypi_0    pypi
libsqlite                 3.40.0               h753d276_0    conda-forge
sqlite                    3.40.0               h4ff8645_0    conda-forge


In [6]:
#@title Install Graph-Tools library

%%time
%%capture


!conda install --channel conda-forge graph-tool --yes

CPU times: user 1.24 s, sys: 192 ms, total: 1.43 s
Wall time: 2min 3s


In [7]:
#@title Install KGTK and Papermill

%%time
%%capture


# install KGTK from pip
!pip install kgtk


# install papermill
!pip install papermill

CPU times: user 134 ms, sys: 48.8 ms, total: 183 ms
Wall time: 13.9 s


In [8]:
#@title Install sqlite db

%%time
%%capture


# install sqlite3
!apt-get install sqlite3=3.34.1 --yes

CPU times: user 27.5 ms, sys: 11.2 ms, total: 38.7 ms
Wall time: 2.72 s


# Step 2: Setup KGTK Graph Cache


In [10]:
#@title Configure KGTK variables

%%time
%%capture

# Import the main kgtk package
from kgtk.functions import kgtk, kypher
from kgtk.configure_kgtk_notebooks import ConfigureKGTK

# Minimal KGTK configuration for this example
ck = ConfigureKGTK(['all'])
ck.configure_kgtk(
    graph_cache_path='/content/wikidata.sqlite3.db',
    output_path='./output',
    project_name='kgtk-tutorial',
)

CPU times: user 477 ms, sys: 83.8 ms, total: 560 ms
Wall time: 2.78 s


In [11]:
#@title Check environment variables

# Review all of the environment variables used
ck.print_env_variables()

TEMP: ./output/kgtk-tutorial/temp.kgtk-tutorial
STORE: /content/wikidata.sqlite3.db
EXAMPLES_DIR: //examples
kypher: kgtk query --graph-cache /content/wikidata.sqlite3.db
KGTK_OPTION_DEBUG: false
KGTK_LABEL_FILE: /root/isi-kgtk-tutorial/kgtk-tutorial_input/labels.en.tsv.gz
kgtk: kgtk
OUT: ./output/kgtk-tutorial
KGTK_GRAPH_CACHE: /content/wikidata.sqlite3.db
USE_CASES_DIR: //use-cases
GRAPH: /root/isi-kgtk-tutorial/kgtk-tutorial_input
all: /root/isi-kgtk-tutorial/kgtk-tutorial_input/all.tsv.gz


In [12]:
#@title Partition our data into separate files

%%time
%%capture


import os
import papermill as pm


pm.execute_notebook(
    "/content/kgtk-notebooks/use-cases/create_wikidata/partition-wikidata.ipynb",
    os.environ["TEMP"] + "/partition-wikidata.out.ipynb",
    parameters=dict(
        wikidata_input_path = os.environ["all"],
        wikidata_parts_path = os.environ["OUT"] + "/parts",
        temp_folder_path = os.environ["OUT"] + "/parts/temp",
        sort_extras = "--buffer-size 30% --temporary-directory $OUT/parts/temp",
        verbose = False,
        gzip_command = 'gzip'
    )
)

CPU times: user 3.76 s, sys: 360 ms, total: 4.12 s
Wall time: 3min 29s


In [13]:
#@title Calculate pagerank for all the nodes
# this is necessary for the search

%%time
%%capture


kgtk("""
  --debug graph-statistics
  -i /content/output/kgtk-tutorial/parts/claims.tsv.gz
  -o /content/output/kgtk-tutorial/parts/metadata.pagerank.undirected.tsv.gz
  --compute-pagerank True
  --compute-hits False
  --page-rank-property Pundirected_pagerank
  --output-degrees False
  --output-pagerank True
  --output-hits False
  --output-statistics-only
  --undirected True
  --log-file ./output/kgtk-tutorial/temp.kgtk-tutorial/metadata.pagerank.undirected.summary.txt
""")

CPU times: user 338 ms, sys: 48.4 ms, total: 387 ms
Wall time: 49.9 s


In [14]:
#@title Create the graph cache db

%%time
%%capture


import os


pm.execute_notebook(
    "/content/kgtk-notebooks/use-cases/create_wikidata/KGTK-Query-Text-Search-Setup.ipynb",
    os.environ["TEMP"] + "/KGTK-Query-Text-Search-Setup.out.ipynb",
    kernel_name='kgtk-env',
    parameters=dict(
        input_path = '/content/output/kgtk-tutorial/parts',
        output_path = '/content/graph-cache-db/',
        project_name = 'kgtk-tutorial',
        create_class_viz = 'no',
        create_db = 'yes',
        create_es = 'no',
    )
)

CPU times: user 2.36 s, sys: 160 ms, total: 2.52 s
Wall time: 59.2 s


# Step 3: Build and run the kgtk-browser

In [15]:
#@title Build our frontend app

%%time
%%capture


%env PUBLIC_URL=/browser
%env REACT_APP_FRONTEND_URL=/browser

import os
os.environ['REACT_APP_USE_KGTK_KYPHER_BACKEND'] = '1'

!cd /content/kgtk-browser/app && npm install
!cd /content/kgtk-browser/app && npm run build

CPU times: user 773 ms, sys: 115 ms, total: 887 ms
Wall time: 1min 30s


In [16]:
#@title Open a connection to this notebook
# using the same port as the kgtk-browser backend

from google.colab.output import eval_js
print(eval_js("google.colab.kernel.proxyPort(3233)"))

https://dd1iipgmbg6-496ff2e9c6d22116-3233-colab.googleusercontent.com/


In [None]:
#@title Run the browser backend


%env DEVELOPMENT=True
%env KGTK_BROWSER_STATIC_URL=/browser
%env KGTK_BROWSER_GRAPH_CACHE=/content/graph-cache-db/kgtk-tutorial/temp.kgtk-tutorial/wikidata.sqlite3.db


# Change into the browser directory and un the kgtk browser command
!cd /content/kgtk-browser/ && python kgtk_browser_app.py

env: DEVELOPMENT=True
env: KGTK_BROWSER_STATIC_URL=/browser
env: KGTK_BROWSER_GRAPH_CACHE=/content/graph-cache-db/kgtk-tutorial/temp.kgtk-tutorial/wikidata.sqlite3.db
 * Serving Flask app 'kgtk_browser_app' (lazy loading)
 * Environment: production
[2m   Use a production WSGI server instead.[0m
 * Debug mode: on
 * Running on all addresses (0.0.0.0)
 * Running on http://127.0.0.1:3233
 * Running on http://172.28.0.12:3233
[33mPress CTRL+C to quit[0m
 * Restarting with stat
 * Debugger is active!
 * Debugger PIN: 350-343-606
127.0.0.1 - - [07/Feb/2023 16:36:48] "[33mGET / HTTP/1.1[0m" 404 -
127.0.0.1 - - [07/Feb/2023 16:36:54] "GET /browser HTTP/1.1" 200 -
127.0.0.1 - - [07/Feb/2023 16:36:54] "GET /browser/static/js/2.3bc2afff.chunk.js HTTP/1.1" 200 -
127.0.0.1 - - [07/Feb/2023 16:36:54] "GET /browser/static/js/main.3c3c3740.chunk.js HTTP/1.1" 200 -
127.0.0.1 - - [07/Feb/2023 16:36:57] "GET /kb/info HTTP/1.1" 200 -
127.0.0.1 - - [07/Feb/2023 16:36:57] "GET /browser/static/js/3.670