# Networkx ATLAS KG construction and RAG example
This notebook demonstrates the full streamlined process of creating a knowledge graph (KG) using the atlas-rag package and performing retrieval-augmented generation (RAG) with our created RAG methods.

## ATLAS KG construction
It is suggested to use local hf model to run the KG construction code, as llm api service provider use optimized, lightweight models to reduce costs, which may sacrifice performance, and hence hard to have guaranteed performance. (for example from fp16 to bf16 etc.)

ATLAS KG construction consist of 5 steps:
- Triples Json Generation (Base KG Json)
- Convert Triples Json to Triples csv
- Conceptualize Entity in Triples csv
- Merge Concept CSV to Triples CSV
- Convert CSV to graphml for networkx to perform rag

In [1]:
from atlas_rag import TripleGenerator, KnowledgeGraphExtractor, ProcessingConfig
from openai import OpenAI
from transformers import pipeline
# client = OpenAI(api_key='<your_api_key>',base_url="<your_api_base_url>") 
# model_name = "meta-llama/llama-3.1-8b-instruct"

model_name = "meta-llama/Llama-3.1-8B-Instruct"
client = pipeline(
    "text-generation",
    model=model_name,
    device_map="auto",
)
keyword = 'Dulce'
output_directory = f'import/{keyword}'
triple_generator = TripleGenerator(client, model_name=model_name)

  from .autonotebook import tqdm as notebook_tqdm
Loading checkpoint shards: 100%|██████████| 4/4 [00:09<00:00,  2.48s/it]


In [2]:
kg_extraction_config = ProcessingConfig(
      model_path=model_name,
      data_directory="tests",
      filename_pattern=keyword,
      batch_size=2,
      output_directory=f"{output_directory}",
)
kg_extractor = KnowledgeGraphExtractor(model=triple_generator, config=kg_extraction_config)

### Triples Generation (with OpenAI Package)

In [3]:
# construct entity&event graph
kg_extractor.run_extraction()

Found data files: ['Dulce.json']


Generating train split: 3 examples [00:00, 243.18 examples/s]


Model: meta-llama/Llama-3.1-8B-Instruct


  0%|          | 0/4 [00:00<?, ?it/s]Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Starting from v4.46, the `logits` model output will have the same type as the model (except at train time, where it will always be FP32)
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
 25%|██▌       | 1/4 [04:17<12:53, 257.95s/it]Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.

Item 9 Entity must be a non-empty array. Problematic item: {'Event': 'The drone lingered and then retreated into the shadows', 'Entity': []}


Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
 75%|███████▌  | 3/4 [10:05<03:14, 194.87s/it]Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
100%|██████████| 4/4 [12:41<00:00, 190.36s/it]


In [4]:
# Convert Triples Json to CSV
kg_extractor.convert_json_to_csv()

Loading data from the json files
Number of files:  1


100%|██████████| 1/1 [00:00<00:00, 149.68it/s]

Processing file for file ids:  hf-meta-llama_Meta-Llama-3.1-8B-Instruct_Dulce_output_20250530170131_1_in_1.json
Data to CSV completed successfully, start computing embeddings.





In [12]:
# Concept Generation
kg_extractor.generate_concept_csv_temp(batch_size=64)

TypeError: generate_concept() got an unexpected keyword argument 'input_dir'

In [5]:
kg_extractor.create_concept_csv()

Loading concepts...


454it [00:00, 140201.30it/s]


Loading concepts done.
Relation to concepts: 117
Node to concepts: 337
Processing triple nodes...


337it [00:00, 23445.08it/s]


Processing concept nodes...


100%|██████████| 838/838 [00:00<00:00, 183283.45it/s]


Processing triple edges...


392it [00:00, 52257.16it/s]


## Choice 1: convert to graphml for networkx rag

In [6]:
# convert csv to graphml for networkx
kg_extractor.convert_to_graphml()

## Choice 2: Convert to neo4j dumps

In [7]:
# add numeric id to the csv so that we can use vector indices
kg_extractor.add_numeric_id()

['name:ID', 'type', 'concepts', 'synsets', ':LABEL']


Adding numeric ID: 337it [00:00, 129120.35it/s]


[':START_ID', ':END_ID', 'relation', 'concepts', 'synsets', ':TYPE']


Adding numeric ID: 392it [00:00, 67040.46it/s]


['text_id:ID', 'original_text', ':LABEL']


Adding numeric ID: 8it [00:00, 5017.86it/s]


## ATLAS RAG

In order to perform RAG, one need to first create embeddings & faiss index for constructed KG

[There maybe performance difference in using AutoModel and Sentence Transformer for NV-Ebmed-v2]

In [1]:
import os 
os.environ['CUDA_VISIBLE_DEVICES'] = '1,2'  # Set to the GPU you want to use, or '0' for the first GPU
import torch
num_gpus = torch.cuda.device_count()
print("number of GPUs available:", torch.cuda.device_count())
for i in range(num_gpus):
    print(f"GPU {i}: {torch.cuda.get_device_name(i)}")

number of GPUs available: 2
GPU 0: NVIDIA L20
GPU 1: NVIDIA L20


In [2]:
from sentence_transformers import SentenceTransformer
from atlas_rag.retriever import NvEmbed
from transformers import AutoModel
# Load the SentenceTransformer model
encoder_model_name = "nvidia/NV-Embed-v2"
# sentence_model = SentenceTransformer(encoder_model_name, trust_remote_code=True, model_kwargs={'device_map': "auto"})
# sentence_model.max_seq_length = 32768
# sentence_model.tokenizer.padding_side="right"
sentence_model = AutoModel.from_pretrained(encoder_model_name, trust_remote_code=True, device_map="auto")
sentence_encoder = NvEmbed(sentence_model)

  from tqdm.autonotebook import tqdm, trange
Loading checkpoint shards: 100%|██████████| 4/4 [00:10<00:00,  2.62s/it]


In [3]:
from openai import OpenAI
from atlas_rag.reader import LLMGenerator
from configparser import ConfigParser
# Load OpenRouter API key from config file
config = ConfigParser()
config.read('config.ini')
# reader_model_name = "meta-llama/llama-3.3-70b-instruct"
reader_model_name = "meta-llama/Llama-3.3-70B-Instruct"
client = OpenAI(
  # base_url="https://openrouter.ai/api/v1",
  # api_key=config['settings']['OPENROUTER_API_KEY'],
  base_url="https://api.deepinfra.com/v1/openai",
  api_key=config['settings']['DEEPINFRA_API_KEY'],
)
llm_generator = LLMGenerator(client=client, model_name=reader_model_name)

In [4]:
from atlas_rag import create_embeddings_and_index
keyword = 'musique'
working_directory = f'/data/httsangaj/atomic-rag/8b'
data = create_embeddings_and_index(
    sentence_encoder=sentence_encoder,
    model_name = 'nvidia/NV-Embed-v2',
    working_directory=working_directory,
    keyword=keyword,
    include_concept=True,
    include_events=True,
    normalize_embeddings= True,
    batch_size=32,
)

Using encoder model: NV-Embed-v2
Loading graph from /data/httsangaj/atomic-rag/8b/kg_graphml/musique_graph.graphml


100%|██████████| 262675/262675 [00:00<00:00, 2168661.91it/s]
100%|██████████| 262675/262675 [00:00<00:00, 1802220.44it/s]
955769it [00:00, 3748774.49it/s]


Computing text embeddings...


  'input_ids': torch.tensor(batch_dict.get('input_ids').to(batch_dict.get('input_ids')).long()),
Encoding texts: 100%|██████████| 365/365 [22:33<00:00,  3.71s/it]
100%|██████████| 365/365 [00:09<00:00, 36.65it/s]


Node and edge embeddings not found, computing...


Encoding nodes:   1%|          | 71/7845 [02:23<4:21:15,  2.02s/it]


KeyboardInterrupt: 

In [None]:
from atlas_rag.evaluation import BenchMarkConfig
benchmark_config = BenchMarkConfig(
    dataset_name= 'musique',
    question_file= "benchmark_data/musique.json",
    include_concept=True,
    include_events=True,
    reader_model_name=reader_model_name,
    encoder_model_name=encoder_model_name,
    number_of_samples=-1, # -1 for all samples
)

In [None]:
from atlas_rag import setup_logger
logger = setup_logger(benchmark_config)

In [None]:
# Initialize desired RAG method for benchmarking
from atlas_rag.retriever import HippoRAG2Retriever
hipporag2_retriever = HippoRAG2Retriever(
    llm_generator=llm_generator,
    sentence_encoder=sentence_encoder,
    data = data,
    logger=logger
)

## Investigation for reason to perfomance difference:
- Version difference for cuda?
- Version difference for huggingface?

In [None]:
# start benchmarking
from atlas_rag.evaluation import RAGBenchmark
benchmark = RAGBenchmark(config=benchmark_config, logger=logger)
benchmark.run([hipporag2_retriever], llm_generator=llm_generator)

## Billion Level KG RAG

from atlas_rag.billion import 