<a href="https://colab.research.google.com/github/trafaon/agile-agent/blob/main/agente_nota_fiscal.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Agente de Consulta de Notas Fiscais - LlamaIndex + Groq

Este notebook permite fazer perguntas em linguagem natural sobre dados de notas fiscais (cabeçalho e itens) usando:
- LlamaIndex
- Modelo LLM via Groq
- Embeddings Hugging Face
- Dados CSV de janeiro de 2024

Para uso, configure o arquivo `.env` com sua chave Groq:


In [1]:
# LlamaIndex com suporte a LLMs externos e leitores de arquivos
!pip install -q llama-index

# Integração com modelo Groq
!pip install -q llama-index-llms-groq

# Embeddings via Hugging Face
!pip install -q llama-index-embeddings-huggingface

# Leitura de arquivos CSV
!pip install -q llama-index-readers-file

# Carregar variáveis de ambiente (.env)
!pip install python-dotenv

Collecting llama-index
  Downloading llama_index-0.12.39-py3-none-any.whl.metadata (12 kB)
Collecting llama-index-agent-openai<0.5,>=0.4.0 (from llama-index)
  Downloading llama_index_agent_openai-0.4.9-py3-none-any.whl.metadata (438 bytes)
Collecting llama-index-cli<0.5,>=0.4.1 (from llama-index)
  Downloading llama_index_cli-0.4.2-py3-none-any.whl.metadata (1.6 kB)
Collecting llama-index-core<0.13,>=0.12.39 (from llama-index)
  Downloading llama_index_core-0.12.39-py3-none-any.whl.metadata (2.4 kB)
Collecting llama-index-embeddings-openai<0.4,>=0.3.0 (from llama-index)
  Downloading llama_index_embeddings_openai-0.3.1-py3-none-any.whl.metadata (684 bytes)
Collecting llama-index-indices-managed-llama-cloud>=0.4.0 (from llama-index)
  Downloading llama_index_indices_managed_llama_cloud-0.7.2-py3-none-any.whl.metadata (3.3 kB)
Collecting llama-index-llms-openai<0.5,>=0.4.0 (from llama-index)
  Downloading llama_index_llms_openai-0.4.1-py3-none-any.whl.metadata (3.0 kB)
Collecting llama-

Collecting crewai-tools
  Downloading crewai_tools-0.46.0-py3-none-any.whl.metadata (10 kB)
Collecting docker>=7.1.0 (from crewai-tools)
  Downloading docker-7.1.0-py3-none-any.whl.metadata (3.8 kB)
Collecting embedchain>=0.1.114 (from crewai-tools)
  Downloading embedchain-0.1.128-py3-none-any.whl.metadata (9.2 kB)
Collecting lancedb>=0.5.4 (from crewai-tools)
  Downloading lancedb-0.22.1-cp39-abi3-manylinux_2_28_x86_64.whl.metadata (4.4 kB)
Collecting pyright>=1.1.350 (from crewai-tools)
  Downloading pyright-1.1.401-py3-none-any.whl.metadata (6.6 kB)
Collecting pytube>=15.0.0 (from crewai-tools)
  Downloading pytube-15.0.0-py3-none-any.whl.metadata (5.0 kB)
Collecting alembic<2.0.0,>=1.13.1 (from embedchain>=0.1.114->crewai-tools)
  Downloading alembic-1.16.1-py3-none-any.whl.metadata (7.3 kB)
Collecting chromadb>=0.4.22 (from crewai-tools)
  Downloading chromadb-0.5.23-py3-none-any.whl.metadata (6.8 kB)
Collecting gptcache<0.2.0,>=0.1.43 (from embedchain>=0.1.114->crewai-tools)
  D

In [24]:
from llama_index.core import Settings
from llama_index.llms import groq

# montar drive
from google.colab import drive
drive.mount('/content/drive')

cabecalho_path = '/content/drive/MyDrive/Colab Notebooks/202401_NFs_Cabecalho.csv'
itens_path = '/content/drive/MyDrive/Colab Notebooks/202401_NFs_Itens.csv'

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [28]:
from llama_index.core import VectorStoreIndex, Settings
from dotenv import load_dotenv
import os
from pathlib import Path # Import the Path class

# Carregando os CSVs
from llama_index.readers.file import CSVReader

# Importe o modelo de embeddings da Hugging Face
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.groq import Groq

# Carregar variáveis de ambiente do arquivo .env
load_dotenv()

# Configurar o modelo de embeddings (usando um modelo comum como exemplo)
# Você pode escolher outro modelo da Hugging Face se preferir
Settings.embed_model = HuggingFaceEmbedding(model_name="sentence-transformers/all-MiniLM-L6-v2")

# Configurar o modelo LLM (Groq)
groq_api_key = os.getenv("GROQ_API_KEY")
if groq_api_key is None:
    raise ValueError("GROQ_API_KEY not found in environment variables.")

Settings.llm = Groq(model="llama3-8b-8192", api_key=groq_api_key) # Exemplo de modelo Groq

reader = CSVReader()

# Carregar os documentos de cada arquivo CSV
# Certifique-se de que os paths estão corretos
cabecalho_docs = reader.load_data(file=Path(cabecalho_path))
itens_docs = reader.load_data(file=Path(itens_path))

# Combinar os documentos em uma única lista
docs = cabecalho_docs + itens_docs

# Criando o índice - ele agora usará o embed_model e llm definidos em Settings
index = VectorStoreIndex.from_documents(docs)

# Criando o mecanismo de pergunta
query_engine = index.as_query_engine()

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.5k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [29]:
resposta = query_engine.query("Qual fornecedor recebeu o maior valor?")
print(resposta)

Based on the provided data, the supplier that received the largest value is J NUNES DISTRIBUIDORA DE ALIMENTOS EIRELI, with a total value of R$ 71.2.
