# Hosting Large Knowledge Graphs with Neo4j Community Edition

This notebook demonstrates how to host and work with large knowledge graphs using Neo4j Community Edition.

## Setup Instructions

1. Download and install Neo4j Community Edition (CE) using the provided script:
   - Run `cd neo4j_scripts` and `sh get_neo4j_cc.sh` which will:
     - Download Neo4j CE
     - Install required plugins (APOC, GDS)
     - Configure ports and passwords
     - Initialize the database

Meanwhile, run `sh get_neo4j_pes2o.sh` and `sh get_neo4j_wiki.sh` as well

2. Key Configuration Details:
   - Default credentials:
     - Username: neo4j
     - Password: admin2024



Copy the ```AutoschemaKG/neo4j_scripts/neo4j.conf``` file to the conf directory of the Neo4j server (```neo4j-server-dulce/conf```). Then, update the following settings as needed: 1.Set dbms.default_database to the desired dataset name, such as ```wiki-csv-json-text```, ```pes2o-csv-json-text```, or ```cc-csv-json-text```. 

2.Configure the Bolt, HTTP, and HTTPS connectors according to your requirements. If you want to run them together, you must set the port differently to avoid port confliction. 


# Import Data
We use the admin import method to import data, which is the fastest way. Other methods are too slow for large graphs. You need to download the data from our one-drive [link](https://hkustconnect-my.sharepoint.com/:f:/g/personal/jbai_connect_ust_hk/EgJCqoU91KpAlSSOi6dzgccB6SCL4YBpsCyEtGiRBV4WNg): ATLAS Neo4j Dump. Please download all the zip files. You can run the `decompress_csv_files.sh` to decompress all the zips in parallel. decompress them to the ```decompressed``` directory. [Here](https://sushantag9.medium.com/download-data-from-onedrive-using-command-line-d27196a676d9) is the tutorial for downloading large files using onedrive. Suppose you have download the data into the following dir. You need to have enough disk to import the databases for these servers. 

Here is the space needed for the servers after import. Put the decompressed files into ```./import```
```
342G    ./neo4j-server-wiki
907G    ./neo4j-server-cc
249G    ./neo4j-server-pes2o
2.3T    ./import 
```



In [1]:
# We need to add numeric ids to these csv files before loading them into database.
# We need to use this because we will use an externam faiss index. We do not use the built-in vector index in neo4j
# because it cannot support billion level vectors well.

from atlas_rag.utils.csv_add_column import add_csv_columns

decompressed_dir = "/data/jbai/autoschema_servers/import" 
for filename_pattern in ["cc_en", "en_simple_wiki_v0", "pes2o_abstract"]:     
    add_csv_columns(
                node_csv=f"{decompressed_dir}/triple_nodes_{filename_pattern}_from_json_without_emb.csv",
                edge_csv=f"{decompressed_dir}/triple_edges_{filename_pattern}_from_json_without_emb_full_concept.csv",
                text_csv=f"{decompressed_dir}/text_nodes_{filename_pattern}_from_json.csv",
                node_with_numeric_id=f"{decompressed_dir}/triple_nodes_{filename_pattern}_from_json_without_emb_with_numeric_id.csv",
                edge_with_numeric_id=f"{decompressed_dir}/triple_edges_{filename_pattern}_from_json_without_emb_full_concept_with_numeric_id.csv",
                text_with_numeric_id=f"{decompressed_dir}/text_nodes_{filename_pattern}_from_json_with_numeric_id.csv",
    )

  from .autonotebook import tqdm as notebook_tqdm
  "content": "[[ ## question ## ]]\nSolve \(x^2 - 5x + 6 = 0\)."
  "content": """[[ ## question ## ]]



For the wiki

``` shell
./neo4j-server-wiki/bin/neo4j-admin database import full wiki-csv-json-text \
    --nodes=./import/text_nodes_en_simple_wiki_v0_from_json_with_numeric_id.csv \
    ./import/triple_nodes_en_simple_wiki_v0_from_json_without_emb_with_numeric_id.csv \
    ./import/concept_nodes_en_simple_wiki_v0_from_json_without_emb.csv \
    --relationships=./import/text_edges_en_simple_wiki_v0_from_json.csv \
    ./import/triple_edges_en_simple_wiki_v0_from_json_without_emb_full_concept_with_numeric_id.csv \
    ./import/concept_edges_en_simple_wiki_v0_from_json_without_emb.csv \
    --overwrite-destination \
    --multiline-fields=true \
    --id-type=string \
    --verbose --skip-bad-relationships=true
```

For the pes2o
``` shell

./neo4j-server-pes2o/bin/neo4j-admin database import full pes2o-csv-json-text \
    --nodes=./import/text_nodes_pes2o_abstract_from_json_with_numeric_id.csv \
    ./import/triple_nodes_pes2o_abstract_from_json_without_emb_with_numeric_id.csv \
    ./import/concept_nodes_pes2o_abstract_from_json_without_emb.csv \
    --relationships=./import/text_edges_pes2o_abstract_from_json.csv  \
    ./import/triple_edges_pes2o_abstract_from_json_without_emb_full_concept_with_numeric_id.csv \
     ./import/concept_edges_pes2o_abstract_from_json_without_emb.csv  \
    --overwrite-destination \
    --multiline-fields=true \
    --verbose --skip-bad-relationships=true --bad-tolerance=100000
```

For the cc
``` shell
./neo4j-server-cc/bin/neo4j-admin database import full cc-csv-json-text \
    --nodes=./import/text_nodes_cc_en_from_json_with_numeric_id.csv \
    ./import/triple_nodes_cc_en_from_json_without_emb_with_numeric_id.csv \
    ./import/concept_nodes_cc_en_from_json_without_emb.csv \
    --relationships=./import/text_edges_cc_en_from_json.csv \
    ./import/triple_edges_cc_en_from_json_without_emb_full_concept_with_numeric_id.csv \
    ./import/concept_edges_cc_en_from_json_without_emb.csv\
    --overwrite-destination \
    --multiline-fields=true \
    --verbose --skip-bad-relationships=true

```
