# Hosting of Large Knowledge Graphs with Neo4j Community Edition.
We provided 2 ways to host neo4j server
1. From OneDrive zip
    - Download the zipped server with data from OneDrive
    - Unzip and run neo4j-server-{dataset-name}/bin/neo4j start to host the server
2. From Source
    - We provided Nodes, Edges, Texts csv files for you to generate embeddings and numeric id, building from source.

# From OneDrive 
You can simply download the zipped server from our one-drive [link](https://hkustconnect-my.sharepoint.com/:f:/g/personal/jbai_connect_ust_hk/EgJCqoU91KpAlSSOi6dzgccB6SCL4YBpsCyEtGiRBV4WNg): ATLAS Neo4j Server Zip. 

Unzip and run
```shell
run neo4j-server-{dataset-name}/bin/neo4j start
```


# Build From Source

This notebook demonstrates how to host and work with large knowledge graphs using Neo4j Community Edition.

## Setup Instructions

1. Download and install Neo4j Community Edition (CE) using the provided script:
   - Run `cd neo4j_scripts` and `sh get_neo4j_cc.sh` which will:
     - Download Neo4j CE
     - Install required plugins (APOC, GDS)
     - Configure ports and passwords
     - Initialize the database

Meanwhile, run `sh get_neo4j_pes2o.sh` and `sh get_neo4j_wiki.sh` as well

2. Key Configuration Details:
   - Default credentials:
     - Username: neo4j
     - Password: admin2024



Copy the ```AutoschemaKG/neo4j_scripts/neo4j.conf``` file to the conf directory of the Neo4j server (```neo4j-server-dulce/conf```). Then, update the following settings as needed: 1.Set dbms.default_database to the desired dataset name, such as ```wiki-csv-json-text```, ```pes2o-csv-json-text```, or ```cc-csv-json-text```. 

2.Configure the Bolt, HTTP, and HTTPS connectors according to your requirements. If you want to run them together, you must set the port differently to avoid port confliction. 


# Import Data
We use the admin import method to import data, which is the fastest way. Other methods are too slow for large graphs. You need to download the data from our one-drive [link](https://hkustconnect-my.sharepoint.com/:f:/g/personal/jbai_connect_ust_hk/EgJCqoU91KpAlSSOi6dzgccB6SCL4YBpsCyEtGiRBV4WNg): ATLAS Neo4j Dump. Please download all the zip files. You can run the `decompress_csv_files.sh` to decompress all the zips in parallel. decompress them to the ```decompressed``` directory. [Here](https://sushantag9.medium.com/download-data-from-onedrive-using-command-line-d27196a676d9) is the tutorial for downloading large files using onedrive. Suppose you have download the data into the following dir. You need to have enough disk to import the databases for these servers. 

Here is the space needed for the servers after import. Put the decompressed files into ```./import```
```
342G    ./neo4j-server-wiki
907G    ./neo4j-server-cc
249G    ./neo4j-server-pes2o
2.3T    ./import 
```



In [1]:
# You can SKIP this part, which can takes several hours. We have also added the CSVs with numeric ids in the zips

# We need to add numeric ids to these csv files before loading them into database.
# We need to use this because we will use an externam faiss index. We do not use the built-in vector index in neo4j
# because it cannot support billion level vectors well.

from atlas_rag.utils.csv_add_column import add_csv_columns

decompressed_dir = "/data/jbai/autoschema_servers/import" 
for filename_pattern in ["cc_en", "en_simple_wiki_v0", "pes2o_abstract"]:     
    add_csv_columns(
                node_csv=f"{decompressed_dir}/triple_nodes_{filename_pattern}_from_json_without_emb.csv",
                edge_csv=f"{decompressed_dir}/triple_edges_{filename_pattern}_from_json_without_emb_full_concept.csv",
                text_csv=f"{decompressed_dir}/text_nodes_{filename_pattern}_from_json.csv",
                node_with_numeric_id=f"{decompressed_dir}/triple_nodes_{filename_pattern}_from_json_without_emb_with_numeric_id.csv",
                edge_with_numeric_id=f"{decompressed_dir}/triple_edges_{filename_pattern}_from_json_without_emb_full_concept_with_numeric_id.csv",
                text_with_numeric_id=f"{decompressed_dir}/text_nodes_{filename_pattern}_from_json_with_numeric_id.csv",
    )

  from .autonotebook import tqdm as notebook_tqdm
  "content": "[[ ## question ## ]]\nSolve \(x^2 - 5x + 6 = 0\)."
  "content": """[[ ## question ## ]]



For the wiki

``` shell
./neo4j-server-wiki/bin/neo4j-admin database import full wiki-csv-json-text \
    --nodes=./import/text_nodes_en_simple_wiki_v0_from_json_with_numeric_id.csv \
    ./import/triple_nodes_en_simple_wiki_v0_from_json_without_emb_with_numeric_id.csv \
    ./import/concept_nodes_en_simple_wiki_v0_from_json_without_emb.csv \
    --relationships=./import/text_edges_en_simple_wiki_v0_from_json.csv \
    ./import/triple_edges_en_simple_wiki_v0_from_json_without_emb_full_concept_with_numeric_id.csv \
    ./import/concept_edges_en_simple_wiki_v0_from_json_without_emb.csv \
    --overwrite-destination \
    --multiline-fields=true \
    --id-type=string \
    --verbose --skip-bad-relationships=true
```

For the pes2o
``` shell

./neo4j-server-pes2o/bin/neo4j-admin database import full pes2o-csv-json-text \
    --nodes=./import/text_nodes_pes2o_abstract_from_json_with_numeric_id.csv \
    ./import/triple_nodes_pes2o_abstract_from_json_without_emb_with_numeric_id.csv \
    ./import/concept_nodes_pes2o_abstract_from_json_without_emb.csv \
    --relationships=./import/text_edges_pes2o_abstract_from_json.csv  \
    ./import/triple_edges_pes2o_abstract_from_json_without_emb_full_concept_with_numeric_id.csv \
     ./import/concept_edges_pes2o_abstract_from_json_without_emb.csv  \
    --overwrite-destination \
    --multiline-fields=true \
    --verbose --skip-bad-relationships=true --bad-tolerance=100000
```

For the cc
``` shell
./neo4j-server-cc/bin/neo4j-admin database import full cc-csv-json-text \
    --nodes=./import/text_nodes_cc_en_from_json_with_numeric_id.csv \
    ./import/triple_nodes_cc_en_from_json_without_emb_with_numeric_id.csv \
    ./import/concept_nodes_cc_en_from_json_without_emb.csv \
    --relationships=./import/text_edges_cc_en_from_json.csv \
    ./import/triple_edges_cc_en_from_json_without_emb_full_concept_with_numeric_id.csv \
    ./import/concept_edges_cc_en_from_json_without_emb.csv\
    --overwrite-destination \
    --multiline-fields=true \
    --verbose --skip-bad-relationships=true

```


# ATLAS RAG API hosting
After hosting the neo4j server and obtaining the uri, user name and password of the neo4j database, you can proceed to host the ATLAS RAG API, with our provided package.

For FastAPI, due to the packaging nature, currently we only support 1 rag endpoint at a time.

For example, run:
```shell
python neo4j_api_host/atlas_api.py
```

You can modify the `keyword = 'cc_en'` to other keywords to host other two graphs with the correct pre-built corresponding faiss indeces: `node_index` and `text_index`.

In [7]:
from openai import OpenAI

base_url ="http://0.0.0.0:10089/v1/"
client = OpenAI(api_key="EMPTY", base_url=base_url)

# knowledge graph en_simple_wiki_v0
message = [
    {
        "role": "system",
        "content": "You are a helpful assistant that answers questions based on the knowledge graph.",
    },
    {
        "role": "user",
        "content": "Question: Who is Alex Mercer?",
    }
]
response = client.chat.completions.create(
    model="llama",
    messages=message,
    max_tokens=2048,
    temperature=0.5,
    extra_body = {
        "retriever_config":{ # configure based on the size of your knowledge graph
            "topN": 5,
            "number_of_source_nodes_per_ner": 1,
            "sampling_area": 10 
        }
    }
)
print(response.choices[0].message.content)

Alex Mercer is a fictional character and the protagonist of the video game "Prototype" developed by Radical Entertainment. He is a scientist who gains superhuman abilities after being infected with a virus known as the Blacklight virus.

In the game, Alex Mercer is a scientist working for a company called Gentek when he discovers that they are secretly experimenting with a deadly virus that has the power to rewrite human DNA. After being infected with the virus, Alex gains incredible abilities such as superhuman strength, agility, and durability. He also develops the ability to shape-shift and absorb the memories and abilities of others by consuming their biomass.

As Alex navigates a post-apocalyptic New York City filled with infected creatures and military forces trying to contain the outbreak, he sets out to uncover the truth behind the conspiracy that led to his transformation and to stop those responsible for unleashing the virus.

Throughout the game, Alex Mercer is portrayed as 