# Networkx ATLAS KG construction and RAG example
This notebook demonstrates the full streamlined process of creating a knowledge graph (KG) using the atlas-rag package and performing retrieval-augmented generation (RAG) with our created RAG methods.

## ATLAS KG construction
It is suggested to use local hf model to run the KG construction code, as llm api service provider use optimized, lightweight models to reduce costs, which may sacrifice performance, and hence hard to have guaranteed performance. (for example from fp16 to bf16 etc.)

ATLAS KG construction consist of 5 steps:
- Triples Json Generation (Base KG Json)
- Convert Triples Json to Triples csv
- Conceptualize Entity in Triples csv
- Merge Concept CSV to Triples CSV
- Convert CSV to graphml for networkx to perform rag

In [1]:
from atlas_rag import TripleGenerator, KnowledgeGraphExtractor, ProcessingConfig
from openai import OpenAI
from transformers import pipeline
# client = OpenAI(api_key='<your_api_key>',base_url="<your_api_base_url>") 
# model_name = "meta-llama/llama-3.1-8b-instruct"

model_name = "meta-llama/Llama-3.1-8B-Instruct"
client = pipeline(
    "text-generation",
    model=model_name,
    device_map="auto",
)
keyword = 'Dulce'
output_directory = f'import/{keyword}'
triple_generator = TripleGenerator(client, model_name=model_name)

  from .autonotebook import tqdm as notebook_tqdm
Loading checkpoint shards: 100%|██████████| 4/4 [00:12<00:00,  3.19s/it]


In [2]:
kg_extraction_config = ProcessingConfig(
      model_path=model_name,
      data_directory="tests",
      filename_pattern=keyword,
      batch_size=2,
      output_directory=f"{output_directory}",
)
kg_extractor = KnowledgeGraphExtractor(model=triple_generator, config=kg_extraction_config)

### Triples Generation (with OpenAI Package)

In [None]:
# construct entity&event graph
kg_extractor.run_extraction()

In [None]:
# Convert Triples Json to CSV
kg_extractor.convert_json_to_csv()

In [None]:
# Concept Generation
kg_extractor.generate_concept_csv_temp(batch_size=64)

In [None]:
kg_extractor.create_concept_csv()

In [3]:
# convert csv to graphml for networkx
kg_extractor.convert_to_graphml()

## ATLAS RAG

In order to perform RAG, one need to first create embeddings & faiss index for constructed KG

In [3]:
from sentence_transformers import SentenceTransformer
# Load the SentenceTransformer model
sentence_model = SentenceTransformer('all-MiniLM-L6-v2')



In [4]:
from atlas_rag import create_embeddings_and_index
keyword = 'Dulce'
output_directory = f'import/{keyword}'
create_embeddings_and_index(
    sentence_encoder=sentence_model,
    model_name = 'all-MiniLM-L6-v2',
    output_directory=output_directory,
    keyword=keyword,
    include_concept=True,
    include_events=True,
)

Using encoder model: all-MiniLM-L6-v2
Loading graph from import/Dulce/kg_graphml/Dulce_graph.graphml


100%|██████████| 1183/1183 [00:00<00:00, 1415248.61it/s]
100%|██████████| 1183/1183 [00:00<00:00, 896776.00it/s]
2470it [00:00, 1549959.74it/s]


Computing text embeddings...


Encoding texts: 100%|██████████| 2/2 [00:00<00:00,  7.38it/s]
100%|██████████| 1/1 [00:00<00:00, 2313.46it/s]


Node and edge embeddings not found, computing...


Encoding nodes: 100%|██████████| 30/30 [00:00<00:00, 50.52it/s]
Encoding edges: 100%|██████████| 54/54 [00:01<00:00, 47.60it/s]


Graph embeddings computed


100%|██████████| 37/37 [00:00<00:00, 307.03it/s]
100%|██████████| 67/67 [00:00<00:00, 290.72it/s]
