<a href="https://colab.research.google.com/github/wey-gu/jupyter_nebulagraph/blob/main/docs/get_started.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This guide will help you walk through end to end process of tweaking NebulaGraph within Jupyter Notebook.

### Prerequirements

We need to have a running NebulaGraph Cluster. If you don't have one, you could leverage [NebulaGraph-Lite](https://github.com/wey-gu/nebulagraph-lite/) to do spawn an ad-hoc cluster, for more options please refer to [NebulaGraph Docs](https://docs.nebula-graph.io).

> See also here for more NebulaGraph installation options: [NebulaGraph Installation Options](https://jupyter-nebulagraph.readthedocs.io/en/latest/installation/#nebulagraph-installation-options) from jupyter-nebulagraph documentation.

In [None]:
%pip install nebulagraph-lite

In [None]:
from nebulagraph_lite import nebulagraph_let as ng_let

n = ng_let()

# This takes around 5 mins
n.start()

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
...var/lib/rpm/__db.002
var/lib/rpm/__db.003
2ccae830-d547-3d25-8d86-c8e24b20d62e
Debug: using curl executable 
Debug: Localrepo homedir is /home/user/.udocker
Debug: using curl executable 
Debug: already installed, installation skipped
[1;3;38;2;102;81;145mSHOW TAGS: ResultSet(None)[0m
Info: downloading layer sha256:73dde089847b2e7be0b3e12a438aa50dab9d29587a3f37512daf74271d6f7eb1
Info: downloading layer sha256:8a5d5aed99ca3dd1343afcb47ea7136fa6350a36a31c12ebc62a6a92ddf1c7ef
Info: downloading layer sha256:19bc7f3f0d802b7e8dc89786cfa0e18ab81bc501d90e1d24720d470e4e213c03
Info: downloading layer sha256:7264a8db6415046d36d16ba98b79778e18accee6ffa71850405994cffa9be7de
[1;3;38;2;160;81;149mInfo: loading basketballplayer dataset...[0m
[1;3;38;2;212;80;135m
  _   _      _           _        ____                 _     
 | \ | | ___| |__  _   _| | __ _ / ___|_ __ __ _ _ __ | |__  
 |  \| |/ _ | '_ \| | | | |/ _` | |  _| '__/ _

### Installation

First, install with pip:

```bash
%pip install jupyter_nebulagraph
```

Second, load extension:

```bash
$load_ext ngql
```

In [None]:
%pip install jupyter_nebulagraph

In [1]:
%load_ext ngql

### Connect to NebulaGraph

With:

```bash
%ngql --address <ip> --port <port> --user <username> --password <password>
```

By default, spaces of the cluster will be printed.

In [2]:
%ngql --address 127.0.0.1 --port 9669 --user root --password nebula

Connection Pool Created


Unnamed: 0,Name
0,demo_basketballplayer
1,freebase_15k
2,nba
3,news


## Query

Then we could make a query after `Connection Pool Created` shown in the connection from last step:


### Oneliner Query `%ngql`

Option 1, it supports one line query as:
```ngql
%ngql <query_line>;
```

In [17]:
%ngql USE basketballplayer;
%ngql MATCH (v:player{name:"Tim Duncan"})-->(v2:player) RETURN v2.player.name AS Name;

Unnamed: 0,Name
0,Tony Parker
1,Manu Ginobili


### Multiline Query `%%ngql`

Option 2, to perform multiple queries in one go.

```ngql
%%ngql
<line 0>;
<line 1>;
```

In [4]:
%%ngql
USE basketballplayer;
SUBMIT JOB STATS;
SHOW STATS;

Unnamed: 0,Type,Name,Count
0,Tag,player,54
1,Tag,team,30
2,Edge,follow,82
3,Edge,serve,146
4,Space,vertices,84
5,Space,edges,228


### Cheatsheet

The only takeout should be:

You could always get help from `%ngql help` for some details of supported magics && examples you could copy from.

```ngql
%ngql help
```

### Using Variables in Query String

We used Jinja2(https://jinja.palletsprojects.com/) as templating method for variables in query string:

```python
trainer = "Sue"
```

```ngql
%%ngql
GO FROM "{{ trainer }}" OVER owns_pokemon YIELD owns_pokemon._dst as pokemon_id | GO FROM $-.pokemon_id OVER owns_pokemon REVERSELY YIELD owns_pokemon._dst AS Trainer_Name;
```

In [5]:
vid = "player100"

In [6]:
%%ngql
MATCH (v)<-[e:follow]- (v2)-[e2:serve]->(v3)
  WHERE id(v) == "{{ vid }}"
RETURN v2.player.name AS FriendOf, v3.team.name AS Team LIMIT 3;

Unnamed: 0,FriendOf,Team
0,Boris Diaw,Spurs
1,Boris Diaw,Jazz
2,Boris Diaw,Suns


## Result Handling

By default, the query result is a Pandas Dataframe, and we could access that by read from variable `_`.

### Dataframe Result(default)

For instance:

In [7]:
df = _

In [8]:
df

Unnamed: 0,FriendOf,Team
0,Boris Diaw,Spurs
1,Boris Diaw,Jazz
2,Boris Diaw,Suns


### Tweaking Raw Result(Optional)

By default the result `ngql_result_style` is `pandas`, this enabled us to have a table view rendered by Jupyter Notebook.

While, if you would like to get raw results from `neutron3-python` itself, just configure it as below on the fly:

```
%config IPythonNGQL.ngql_result_style="raw"
```

And after querying, the result will be stored in `_`, plesae then refer it to a new variable for further ad-hoc tweaking on it like:
```
$ngql <query>;

result = _

dir(result)
```

In [9]:
%config IPythonNGQL.ngql_result_style="raw"

In [11]:
%%ngql
USE demo_basketballplayer;
GO 2 STEPS FROM "player102" OVER follow YIELD dst(edge);

ResultSet(keys: ['dst(EDGE)'], values: ["player100"],["player102"],["player125"],["player101"],["player125"])

In [12]:
r = _

In [13]:
r.column_values("dst(EDGE)")[0].cast()

'player100'

Now we change back to `pandas` `ngql_result_style`

In [14]:
%config IPythonNGQL.ngql_result_style="pandas"

In [16]:
%%ngql
GO FROM "player100", "player102" OVER serve
  WHERE properties(edge).start_year > 1995
YIELD DISTINCT properties($$).name AS team_name, properties(edge).start_year AS start_year, properties($^).name AS player_name;

Unnamed: 0,team_name,start_year,player_name
0,Spurs,1997,Tim Duncan
1,Trail Blazers,2006,LaMarcus Aldridge
2,Spurs,2015,LaMarcus Aldridge


## Load Data from CSV

Since 0.9.0, it is supported to load data into NebulaGraph with ease.

We could load data from a local path or a URL:

In [18]:
%ng_load --source https://github.com/wey-gu/ipython-ngql/raw/main/examples/actor.csv --tag player --vid 0 --props 1:name,2:age --space basketballplayer

Parsed 3 vertices 'demo_basketballplayer' for tag 'player' in memory


Loading Vertices:   0%|          | 0/1 [00:00<?, ?it/s]

Loaded 3 of 3 vertices
Successfully loaded 3 vertices 'demo_basketballplayer' for tag 'player'


### `%ng_load` docs

The %ng_load magic command is designed to facilitate the loading of data from CSV files into NebulaGraph as vertices or edges. This command streamlines the process of importing data directly within a Jupyter Notebook environment, making it easier for users to work with NebulaGraph databases.


#### Usage

```
%ng_load --source <source> [--header] --space <space> [--tag <tag>] [--vid <vid>] [--edge <edge>] [--src <src>] [--dst <dst>] [--rank <rank>] [--props <props>] [-b <batch>]
```

#### Arguments

- `--header`: (Optional) Indicates if the CSV file contains a header row. If this flag is set, the first row of the CSV will be treated as column headers.
- `-n`, `--space` (Required): Specifies the name of the NebulaGraph space where the data will be loaded.
- `-s`, `--source` (Required): The file path or URL to the CSV file. Supports both local paths and remote URLs.
- `-t`, `--tag`: The tag name for vertices. Required if loading vertex data.
- `--vid`: The column index for the vertex ID. Required if loading vertex data.
- `-e`, `--edge`: The edge type name. Required if loading edge data.
- `--src`: The column index for the source vertex ID when loading edges.
- `--dst`: The column index for the destination vertex ID when loading edges.
- `--rank`: (Optional) The column index for the rank value of edges. Default is None.
- `--props`: (Optional) Comma-separated column indexes for mapping to properties. The format for mapping is column_index:property_name.
- `-b`, `--batch` (Optional): Batch size for data loading. Default is 256.

#### Examples

Loading Vertices

To load vertex data from a local CSV file named actor.csv into the basketballplayer space with the player tag, where the vertex ID is in the first column, and the properties name and age are in the second and third columns, respectively:

In [19]:
%ng_load --source actor.csv --tag player --vid 0 --props 1:name,2:age --space basketballplayer

Parsed 3 vertices 'demo_basketballplayer' for tag 'player' in memory


Loading Vertices:   0%|          | 0/1 [00:00<?, ?it/s]

Loaded 3 of 3 vertices
Successfully loaded 3 vertices 'demo_basketballplayer' for tag 'player'


Loading Edges

To load edge data from a local CSV file named follow_with_rank.csv into the basketballplayer space with the follow edge type, where the source vertex ID is in the first column, the destination vertex ID is in the second column, the property degree is in the third column, and the rank is in the fourth column:

In [20]:
%ng_load --source follow_with_rank.csv --edge follow --src 0 --dst 1 --props 2:degree --rank 3 --space basketballplayer

Parsed 1 edges 'demo_basketballplayer' for edge type 'follow' in memory


Loading Edges:   0%|          | 0/1 [00:00<?, ?it/s]

Loaded 1 of 1 edges
Successfully loaded 1 edges 'demo_basketballplayer' for edge type 'follow'


## Draw nGQL queries `%ng_draw`

We could render Graphs with `%ng_draw` thanks to the upstream project `pyvis`.

<img width="948" alt="ng_draw_demo" src="https://github.com/wey-gu/jupyter_nebulagraph/assets/1651790/02454358-8ec7-42a3-815b-e58298184514">

<img width="1142" alt="ng_draw_demo_1" src="https://github.com/wey-gu/jupyter_nebulagraph/assets/1651790/c17d7491-922a-4930-a49c-c55f4e2adee4">


Or `%ng_draw <one_line_query>`, `%%ng_draw <multiline_query>` instead of drawing the result of the last query.


<img width="1142" alt="ng_draw_demo_1" src="https://github.com/wey-gu/jupyter_nebulagraph/assets/1651790/a6e3b2d4-0320-4287-bd2f-537cff77c1de">


In [None]:
%pip install pyvis

In [21]:
%ngql match p=(:player)-[]->() return p LIMIT 5

Unnamed: 0,p
0,"(""player148"" :player{age: 45, name: ""Jason Kid..."
1,"(""player148"" :player{age: 45, name: ""Jason Kid..."
2,"(""player148"" :player{age: 45, name: ""Jason Kid..."
3,"(""player148"" :player{age: 45, name: ""Jason Kid..."
4,"(""player148"" :player{age: 45, name: ""Jason Kid..."


In [None]:
##uncomment to draw
# %ng_draw

In [23]:
%ngql GET SUBGRAPH 2 STEPS FROM "player101" YIELD VERTICES AS nodes, EDGES AS relationships;

Unnamed: 0,nodes,relationships
0,"[(""player101"" :player{})]","[(""player101"")-[:serve@0{}]->(""team204""), (""pl..."
1,"[(""player102"" :player{}), (""player100"" :player...","[(""player102"")-[:serve@0{}]->(""team203""), (""pl..."
2,"[(""player144"" :player{}), (""player112"" :player...","[(""player144"")-[:serve@0{}]->(""team214""), (""pl..."


In [24]:
%ng_draw

<class 'pyvis.network.Network'> |N|=36 |E|=84

## Draw Graph Schema `%ng_draw_schema`

Also, we could quickly draw the schema with `%ng_draw_schema`, which samples all types of edges to show us what the graph looks like.

This example comes from a dataset/space called demo_supplychain, to get those datasets named `demo_*`, you could install NebulaGraph Studio and click Download to have them ingested into NebulaGraph in one minute.

<img width="1008" alt="ng_draw_schema_demo" src="https://github.com/wey-gu/ipython-ngql/assets/1651790/e851289f-0009-42e8-984e-416417f2af8b">


In [25]:
%ngql CREATE SPACE demo_supplychain(partition_num=1, replica_factor=1, vid_type=fixed_string(128));

In [None]:
!sleep 10
%ngql USE demo_supplychain

In [None]:
%%ngql

CREATE TAG IF NOT EXISTS car_model(name string, number string, year int, type string, engine_type string, size string, seats int);
CREATE TAG IF NOT EXISTS feature(name string, number string, type string, state string);
CREATE TAG IF NOT EXISTS `part`(name string, number string, price double, `date` string);
CREATE TAG IF NOT EXISTS supplier(name string, address string, contact string, phone_number string);
CREATE EDGE IF NOT EXISTS with_feature(version string);
CREATE EDGE IF NOT EXISTS is_composed_of(version string);
CREATE EDGE IF NOT EXISTS is_supplied_by(version string);

In [None]:
!sleep 10

In [None]:
%%ngql

INSERT VERTEX `car_model`(`name`, `number`, `year`, `type`, `engine_type`, `size`, `seats`) VALUES "m_1":("Model A", "001", 2023, "Sedan", "Gasoline", "Compact", 4), "m_2":("Model B", "002", 2023, "Coupe", "Electric", "Compact", 2), "m_3":("Model C", "003", 2022, "SUV", "Hybrid", "Large", 7), "m_4":("Model D", "004", 2022, "Truck", "Diesel", "Extra Large", 5), "m_5":("Model E", "005", 2021, "Sedan", "Electric", "Medium", 5), "m_6":("Model F", "006", 2021, "Convertible", "Gasoline", "Compact", 2), "m_7":("Model G", "007", 2023, "Crossover", "Hybrid", "Medium", 5), "m_8":("Model H", "008", 2020, "Hatchback", "Electric", "Compact", 4), "m_9":("Model I", "009", 2022, "Sedan", "Gasoline", "Large", 5), "m_10":("Model J", "010", 2021, "SUV", "Hybrid", "Extra Large", 7);
INSERT VERTEX `supplier`(`name`, `address`, `contact`, `phone_number`) VALUES "s_31":("Supplier A", "123 Street", "John Doe", "1234567890"), "s_32":("Supplier B", "456 Avenue", "Emily Smith", "0987654321"), "s_33":("Supplier C", "789 Boulevard", "Robert Brown", "1112233445"), "s_34":("Supplier D", "101 Place", "Maria Johnson", "2223344556"), "s_35":("Supplier E", "202 Drive", "Michael Williams", "3334455667"), "s_36":("Supplier F", "303 Lane", "Susan Miller", "4445566778"), "s_37":("Supplier G", "404 Road", "Chris Lee", "5556677889"), "s_38":("Supplier H", "505 Street", "Jane Wilson", "6667788990"), "s_39":("Supplier I", "606 Way", "Brian Anderson", "7778899001"), "s_40":("Supplier J", "707 Avenue", "Linda Hall", "8889900112");
INSERT VERTEX `feature`(`name`, `number`, `type`, `state`) VALUES "f_11":("Sunroof", "F001", "Optional", "Available"), "f_12":("Bluetooth", "F002", "Standard", "Available"), "f_13":("Navigation", "F003", "Optional", "N/A"), "f_14":("Heated Seats", "F004", "Standard", "Available"), "f_15":("Backup Camera", "F005", "Optional", "Available"), "f_16":("Leather Seats", "F006", "Standard", "Available"), "f_17":("Adaptive Cruise", "F007", "Optional", "Available"), "f_18":("Blind Spot Monitor", "F008", "Standard", "Available"), "f_19":("Remote Start", "F009", "Optional", "N/A"), "f_20":("Apple CarPlay", "F010", "Standard", "Available");
INSERT VERTEX `part`(`name`, `number`, `price`, `date`) VALUES "p_21":("Brake Pad", "P001", 50, "2023-01-01"), "p_22":("Engine", "P002", 2000, "2023-05-03"), "p_23":("Tire", "P003", 100, "2022-08-14"), "p_24":("Transmission", "P004", 1500, "2022-02-20"), "p_25":("Radiator", "P005", 250, "2022-06-15"), "p_26":("Window Glass", "P006", 60, "2021-11-23"), "p_27":("Battery", "P007", 120, "2023-03-09"), "p_28":("Headlight", "P008", 90, "2023-07-30"), "p_29":("Alternator", "P009", 180, "2022-09-04"), "p_30":("Air Filter", "P010", 20, "2023-04-22");
INSERT EDGE `with_feature`(`version`) VALUES "m_1"->"f_12":("1.0"), "m_2"->"f_13":("1.0"), "m_3"->"f_14":("1.1"), "m_4"->"f_15":("1.2"), "m_5"->"f_11":("1.0"), "m_6"->"f_12":("1.0"), "m_7"->"f_13":("1.0"), "m_8"->"f_14":("1.0"), "m_9"->"f_15":("1.0"), "m_10"->"f_11":("1.0"), "m_2"->"f_12":("1.0"), "m_3"->"f_13":("1.1"), "m_4"->"f_14":("1.2"), "m_5"->"f_15":("1.0"), "m_6"->"f_11":("1.0"), "m_7"->"f_12":("1.0"), "m_8"->"f_13":("1.0"), "m_9"->"f_14":("1.0"), "m_10"->"f_15":("1.0"), "m_1"->"f_11":("1.0"), "m_2"->"f_12":("1.2"), "m_3"->"f_13":("1.1"), "m_4"->"f_12":("1.0"), "m_5"->"f_15":("1.3"), "m_6"->"f_11":("1.2"), "m_7"->"f_14":("1.0"), "m_8"->"f_13":("1.1"), "m_9"->"f_15":("1.2"), "m_10"->"f_12":("1.1"), "m_1"->"f_13":("1.3"), "m_2"->"f_14":("1.0"), "m_3"->"f_11":("1.1"), "m_4"->"f_14":("1.0"), "m_5"->"f_15":("1.2"), "m_6"->"f_13":("1.0"), "m_7"->"f_12":("1.1"), "m_8"->"f_15":("1.1"), "m_9"->"f_11":("1.2"), "m_10"->"f_14":("1.3"), "m_2"->"f_11":("1.0"), "m_3"->"f_12":("1.1"), "m_5"->"f_14":("1.0"), "m_6"->"f_15":("1.1"), "m_8"->"f_12":("1.2"), "m_9"->"f_13":("1.0"), "m_1"->"f_15":("1.2"), "m_7"->"f_13":("1.3"), "m_4"->"f_11":("1.0"), "m_10"->"f_15":("1.1");
INSERT EDGE `is_composed_of`(`version`) VALUES "f_11"->"p_21":("1.0"), "f_12"->"p_22":("1.0"), "f_13"->"p_23":("1.1"), "f_14"->"p_24":("1.2"), "f_15"->"p_21":("1.0"), "f_16"->"p_22":("1.0"), "f_17"->"p_23":("1.0"), "f_18"->"p_24":("1.0"), "f_19"->"p_25":("1.0"), "f_20"->"p_26":("1.0"), "f_11"->"p_27":("1.0"), "f_12"->"p_28":("1.0"), "f_13"->"p_29":("1.1"), "f_14"->"p_30":("1.2"), "f_15"->"p_21":("1.0"), "f_16"->"p_22":("1.0"), "f_17"->"p_23":("1.0"), "f_18"->"p_24":("1.0"), "f_19"->"p_25":("1.0"), "f_20"->"p_26":("1.0");
INSERT EDGE `is_supplied_by`(`version`) VALUES "p_21"->"s_31":("1.0"), "p_22"->"s_32":("1.0"), "p_23"->"s_33":("1.1"), "p_24"->"s_34":("1.2"), "p_25"->"s_35":("1.0"), "p_26"->"s_36":("1.0"), "p_27"->"s_37":("1.0"), "p_28"->"s_38":("1.0"), "p_29"->"s_39":("1.1"), "p_30"->"s_40":("1.2"), "p_21"->"s_31":("1.0"), "p_22"->"s_32":("1.0"), "p_23"->"s_33":("1.1"), "p_24"->"s_34":("1.2"), "p_25"->"s_35":("1.0"), "p_26"->"s_36":("1.0"), "p_27"->"s_37":("1.0"), "p_28"->"s_38":("1.0"), "p_29"->"s_39":("1.1"), "p_30"->"s_40":("1.2");

In [26]:
%ng_draw_schema

<class 'pyvis.network.Network'> |N|=4 |E|=3

## `% ngql help`!

Again, all you have to remember is to use `$ngql help` to have all hints :-)

In [27]:
%ngql help



        Supported Configurations:
        ------------------------
        
        > How to config ngql_result_style in "raw", "pandas"
        %config IPythonNGQL.ngql_result_style="raw"
        %config IPythonNGQL.ngql_result_style="pandas"

        > How to config ngql_verbose in True, False
        %config IPythonNGQL.ngql_verbose=True

        > How to config max_connection_pool_size
        %config IPythonNGQL.max_connection_pool_size=10

        Quick Start:
        -----------

        > Connect to Neubla Graph
        %ngql --address 127.0.0.1 --port 9669 --user user --password password

        > Use Space
        %ngql USE basketballplayer

        > Query
        %ngql SHOW TAGS;

        > Multile Queries
        %%ngql
        SHOW TAGS;
        SHOW HOSTS;

        Reload ngql Magic
        %reload_ext ngql

        > Variables in query, we are using Jinja2 here
        name = "nba"
        %ngql USE "{{ name }}"

        > Query and draw the graph

        %ngql GET 