# Networkx ATLAS KG construction and RAG example
This notebook demonstrates the full streamlined process of creating a knowledge graph (KG) using the atlas-rag package and performing retrieval-augmented generation (RAG) with our created RAG methods.

## ATLAS KG construction
It is suggested to use local hf model to run the KG construction code, as llm api service provider use optimized, lightweight models to reduce costs, which may sacrifice performance, and hence hard to have guaranteed performance. (for example from fp16 to bf16 etc.)

ATLAS KG construction consist of 5 steps:
- Triples Json Generation (Base KG Json)
- Convert Triples Json to Triples csv
- Conceptualize Entity in Triples csv
- Merge Concept CSV to Triples CSV
- Convert CSV to graphml for networkx to perform rag

In [None]:
from atlas_rag import TripleGenerator, KnowledgeGraphExtractor, ProcessingConfig
from openai import OpenAI
from transformers import pipeline
# client = OpenAI(api_key='<your_api_key>',base_url="<your_api_base_url>") 
# model_name = "meta-llama/llama-3.1-8b-instruct"

model_name = "meta-llama/Llama-3.1-8B-Instruct"
client = pipeline(
    "text-generation",
    model=model_name,
    device_map="auto",
)
keyword = 'Dulce'
output_directory = f'import/{keyword}'
triple_generator = TripleGenerator(client, model_name=model_name)

In [None]:
kg_extraction_config = ProcessingConfig(
      model_path=model_name,
      data_directory="tests",
      filename_pattern=keyword,
      batch_size=2,
      output_directory=f"{output_directory}",
)
kg_extractor = KnowledgeGraphExtractor(model=triple_generator, config=kg_extraction_config)

### Triples Generation (with OpenAI Package)

In [None]:
# construct entity&event graph
kg_extractor.run_extraction()

In [None]:
# Convert Triples Json to CSV
kg_extractor.convert_json_to_csv()

In [None]:
# Concept Generation
kg_extractor.generate_concept_csv_temp(batch_size=64)

In [None]:
kg_extractor.create_concept_csv()

In [None]:
# convert csv to graphml for networkx
kg_extractor.convert_to_graphml()

## ATLAS RAG

In order to perform RAG, one need to first create embeddings & faiss index for constructed KG

[There maybe performance difference in using AutoModel and Sentence Transformer for NV-Ebmed-v2]

In [1]:
import os 
os.environ['CUDA_VISIBLE_DEVICES'] = '1'  # Set to the GPU you want to use, or '0' for the first GPU
from sentence_transformers import SentenceTransformer
from transformers import AutoModel
# Load the SentenceTransformer model
encoder_model_name = "nvidia/NV-Embed-v2"
# sentence_model = SentenceTransformer(encoder_model_name, trust_remote_code=True, model_kwargs={'device_map': "auto"})
# sentence_model.max_seq_length = 32768
# sentence_model.tokenizer.padding_side="right"
sentence_model = AutoModel.from_pretrained(encoder_model_name, device_map="auto", trust_remote_code=True)


  from tqdm.autonotebook import tqdm, trange
Loading checkpoint shards: 100%|██████████| 4/4 [00:11<00:00,  2.86s/it]


In [2]:
from openai import OpenAI
from configparser import ConfigParser
# Load OpenRouter API key from config file
config = ConfigParser()
config.read('config.ini')
# reader_model_name = "meta-llama/llama-3.3-70b-instruct"
reader_model_name = "meta-llama/Llama-3.3-70B-Instruct"
client = OpenAI(
  # base_url="https://openrouter.ai/api/v1",
  # api_key=config['settings']['OPENROUTER_API_KEY'],
  base_url="https://api.deepinfra.com/v1/openai",
  api_key=config['settings']['DEEPINFRA_API_KEY'],
)

In [3]:
from atlas_rag import create_embeddings_and_index
keyword = 'musique'
working_directory = f'/data/httsangaj/atomic-rag/8b'
data = create_embeddings_and_index(
    sentence_encoder=sentence_model,
    model_name = 'nvidia/NV-Embed-v2',
    working_directory=working_directory,
    keyword=keyword,
    include_concept=True,
    include_events=True,
)

Using encoder model: NV-Embed-v2
Loading graph from /data/httsangaj/atomic-rag/8b/kg_graphml/musique_graph.graphml


100%|██████████| 262675/262675 [00:00<00:00, 2378615.03it/s]
100%|██████████| 262675/262675 [00:00<00:00, 1835980.42it/s]
955769it [00:00, 3814251.30it/s]


Text embeddings already computed.
Graph embeddings computed
Node and edge embeddings already computed.


In [4]:
from atlas_rag.reader import LLMGenerator
from atlas_rag.retriever import NvEmbed
llm_generator = LLMGenerator(client=client, model_name=reader_model_name)
sentence_encoder = NvEmbed(sentence_model)

In [None]:
from atlas_rag.evaluation import BenchMarkConfig
benchmark_config = BenchMarkConfig(
    dataset_name= 'musique',
    question_file= "benchmark_data/musique.json",
    include_concept=True,
    include_events=True,
    reader_model_name=reader_model_name,
    encoder_model_name=encoder_model_name,
    number_of_samples=-1, # -1 for all samples
)

In [6]:
from atlas_rag import setup_logger
logger = setup_logger(benchmark_config)

In [7]:
# Initialize desired RAG method for benchmarking
from atlas_rag.retriever import HippoRAG2Retriever
hipporag2_retriever = HippoRAG2Retriever(
    llm_generator=llm_generator,
    sentence_encoder=sentence_encoder,
    data = data,
    logger=logger
)

100%|██████████| 262675/262675 [00:00<00:00, 730077.03it/s]


In [8]:
# start benchmarking
from atlas_rag.evaluation import RAGBenchmark
benchmark = RAGBenchmark(config=benchmark_config, logger=logger)
benchmark.run([hipporag2_retriever], llm_generator=llm_generator)

Data loaded from benchmark_data/musique.json
Using only the first 5 samples from the dataset


  'input_ids': torch.tensor(batch_dict.get('input_ids').to(batch_dict.get('input_ids')).long()),
100%|██████████| 5/5 [01:13<00:00, 14.77s/it]
