Skip to content

Commit

Permalink
Merge pull request #549 from usc-isi-i2/dev
Browse files Browse the repository at this point in the history
Dev
  • Loading branch information
saggu committed Oct 23, 2021
2 parents dbb28b8 + 7698a1b commit 95024bf
Show file tree
Hide file tree
Showing 5 changed files with 72 additions and 13 deletions.
42 changes: 32 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,37 @@ KGTK can process Wikidata-sized KGs with billions of edges on a laptop. We have

KGTK is open source software, well documented, actively used and developed, and released using the MIT license. We invite the community to try KGTK. It is easy to get started with our tutorial notebooks available and executable online.

## Installation

> The following instructions install KGTK and the KGTK Jupyter Notebooks on
Linux and MacOS systems.

If you want to install KGTK on a Microsoft Windows system, please
contact the KGTK team.

Our KGTK installations use a Conda virtual environment. If you
don't have the Conda tools installed, follow this
[guide](https://docs.conda.io/projects/conda/en/latest/user-guide/install/) to
install it. We recommend installing Miniconda installation rather than the
full Anaconda installation.

Next, execute the following steps to install the latest stable release
of KGTK:

```bash
conda create -n kgtk-env python=3.8
conda activate kgtk-env
conda install -c conda-forge graph-tool
conda install -c conda-forge jupyterlab
pip --no-cache install -U kgtk
python -m spacy download en_core_web_sm
```

Please see our [installation document](/docs/install.md) for more details. If
you encounter problems with your installation, or are interested in a detailed
explanation of these commands, [read more about the installation procedure
here](KGTK-Installation-Procedure-Details.md).

## Getting started

### Online Documentation
Expand All @@ -31,21 +62,12 @@ https://kgtk.readthedocs.io/en/latest/

### KGTK Notebooks

The [examples folder](examples/) provides a larger and constantly increasing number of easy-to-follow Jupyter Notebooks which showcase different functionalities of KGTK. These include computing:
* Embeddings for ConceptNet nodes
* Graph statistics over a curated subset of Wikidata
* Reachable occupations for selected people in Wikidata
* PageRank over Wikidata
* etc.
For examples of using KGTK, please see our [Tutorial Notebooks](https://github.com/usc-isi-i2/kgtk-notebooks ).

## Releases

* See all [source code releases](https://github.com/usc-isi-i2/kgtk/releases)

## Installation

Please see our [installation document](/docs/install.md) for installation procedures.

## KGTK Text Search API

The documentation for the KGTK Text Search API is [here](https://github.com/usc-isi-i2/kgtk-search)
Expand Down
4 changes: 4 additions & 0 deletions docker/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,10 @@ RUN conda install -c conda-forge jupyterlab

RUN pip install chardet

RUN pip install gensim

RUN pip install papermill

ARG NB_USER=jovyan
ARG NB_UID=1000
ENV USER ${NB_USER}
Expand Down
4 changes: 4 additions & 0 deletions docker/dev/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,10 @@ RUN conda install -c conda-forge jupyterlab

RUN pip install chardet

RUN pip install gensim

RUN pip install papermill

ARG NB_USER=jovyan
ARG NB_UID=1000
ENV USER ${NB_USER}
Expand Down
2 changes: 1 addition & 1 deletion kgtk/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = '1.0.0'
__version__ = '1.0.1'
33 changes: 31 additions & 2 deletions kgtk/functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ def kgtk(arg1: typing.Union[str, pandas.DataFrame],
auto_display_html: typing.Optional[bool] = None,
auto_display_json: typing.Optional[bool] = None,
auto_display_md: typing.Optional[bool] = None,
unquote_column_names: typing.Optional[bool] = None,
bash_command: typing.Optional[str] = None,
kgtk_command: typing.Optional[str] = None,
)->typing.Optional[pandas.DataFrame]:
Expand Down Expand Up @@ -60,6 +61,10 @@ def kgtk(arg1: typing.Union[str, pandas.DataFrame],
This parameter controls the processing of MarkDown output. See below.
unquote_column_names=True/False (default True)
Convert string column names to symbols.
bash_command=CMD (default 'bash')
This parameter specifies the name of the shell interpreter. If the
Expand Down Expand Up @@ -117,11 +122,12 @@ def kgtk(arg1: typing.Union[str, pandas.DataFrame],
Environment Variables
=========== =========
This modeule directly uses the following environment variables:
This module directly uses the following environment variables:
KGTK_AUTO_DISPLAY_HTML
KGTK_AUTO_DISPLAY_JSON
KGTK_AUTO_DISPLAY_MD
KGTK_UNQUOTE_COLUMN_NAMES
KGTK_BASH_COMMAND
KGTK_KGTK_COMMAND
Expand All @@ -142,6 +148,8 @@ def kgtk(arg1: typing.Union[str, pandas.DataFrame],
auto_display_json = os.getenv("KGTK_AUTO_DISPLAY_JSON", "true").lower() in ["true", "yes", "y"]
if auto_display_md is None:
auto_display_md = os.getenv("KGTK_AUTO_DISPLAY_MD", "false").lower() in ["true", "yes", "y"]
if unquote_column_names is None:
unquote_column_names = os.getenv("KGTK_UNQUOTE_COLUMN_NAMES", "true").lower() in ["true", "yes", "y"]

# Why not os.getenv("KGTK_BASH_COMMAND", "bash")? Splitting it up makes
# mypy happier.
Expand Down Expand Up @@ -189,6 +197,18 @@ def kgtk(arg1: typing.Union[str, pandas.DataFrame],
doublequote=False,
escapechar='\\',
)
if unquote_column_names:
# Pandas will have treated the column names as strings and quoted
# them. By convention, KGTK column names are symbols. So, we will
# remove double quotes from the outside of each column name.
#
# TODO: Handle the troublesome case of a double quote inside a column
# name.
header, body = in_tsv.split('\n', 1)
column_names = header.split('\t')
column_names = [x[1:-1] if x.startswith('"') else x for x in column_names ]
header = "\t".join(column_names)
in_tsv = header + "\n" + body

# Execute the KGTK command pipeline:
outbuf: StringIO = StringIO()
Expand Down Expand Up @@ -247,11 +267,20 @@ def kgtk(arg1: typing.Union[str, pandas.DataFrame],
# Assume that anything else is KGTK formatted output. Convert it to a
# pandas DataFrame and return it.
#
# TODO: Test this conversion with all KTK datatypes. Language-qualified
# strings are problematic. Check what happens to quantites, date/times,
# and locations.
#
# TODO: Remove the escape character from internal `|` characters?
# If we do that, should we detect KGTK lists and complain?
# `\|` -> `|`
outbuf.seek(0)
result = pandas.read_csv(outbuf, sep='\t')
result = pandas.read_csv(outbuf,
sep='\t',
quotechar='"',
doublequote=False,
escapechar='\\',
)

outbuf.close()

Expand Down

0 comments on commit 95024bf

Please sign in to comment.