<a href="https://colab.research.google.com/github/vdtegon/SpotifyPredictMusicStyle/blob/main/20240320_TCD_Tutorial_API_Claude_portugu%C3%AAs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<img src="https://github.com/Trading-com-Dados/resources/blob/main/TCD%20LOGO%20001%20-%20PRETO.png?raw=true" width="200"/>

# **Resumo de PDF usando IA - Tutorial com Anthropic Claude**

Bem-vindo a este tutorial, onde extrairemos texto de um arquivo PDF e o resumiremos usando um modelo de IA da Anthropic. Este método é útil para resumir documentos longos de forma eficiente.

## **Etapa 1: Instale as bibliotecas necessárias**

In [None]:
!pip install pymupdf --upgrade

Collecting pymupdf
  Downloading pymupdf-1.25.4-cp39-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (3.4 kB)
Downloading pymupdf-1.25.4-cp39-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (20.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m20.0/20.0 MB[0m [31m66.9 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pymupdf
Successfully installed pymupdf-1.25.4


## **Etapa 2: Importar bibliotecas necessárias**

In [None]:
import pymupdf as pp
import requests
import json

## **Etapa 3: Configurar a chave da API do Claude**

Acesse https://console.anthropic.com/ e crie sua chave de API

In [None]:
# Substitua pela sua chave de API do Claude 3
ANTHROPIC_API_KEY = "SUA CHAVE AQUI"

## **Etapa 4: Função para extrair texto de PDF**

In [None]:
# Função para extrair texto de um PDF
def extract_text_from_pdf(pdf_path):
    doc = pp.open(pdf_path)
    text = ""
    for page in doc:
        text += page.get_text("text") + "\n"
    return text

## **Etapa 5: Função para resumir texto usando IA**

Usaremos `claude-3-opus-20240229`, um modelo pré-treinado otimizado para resumo de texto.

🔹 Claude 3 Opus (Anthropic) - Melhor para PDFs grandes

Prós: Lida com documentos muito grandes melhor do que o GPT-4.

Caso de uso: resumir relatórios longos, contratos e responder a perguntas detalhadas.

In [None]:
# Função para resumir texto usando a API Claude 3
def summarize_with_claude(text):
    api_url = "https://api.anthropic.com/v1/messages"
    headers = {
        "x-api-key": ANTHROPIC_API_KEY,
        "anthropic-version": "2023-06-01",
        "content-type": "application/json"
    }

# Solicitar que Claude 3 resuma o texto do PDF
    prompt = f"Summarize the following document and highlight the 3 most important insights:\n\n{text[:6000]}"  # Limiting to 5000 chars

    payload = {
        "model": "claude-3-opus-20240229",  # Use 'claude-3-sonnet' para resultados mais rápidos
        "max_tokens": 400,
        "messages": [{"role": "user", "content": prompt}]
    }

    response = requests.post(api_url, headers=headers, json=payload)

    if response.status_code == 200:
        result = response.json()
        return result["content"][0]["text"]
    else:
        return f"Error: {response.text}"

## **Etapa 6: Carregue um arquivo PDF**

Agora carregaremos um **arquivo PDF** da nossa máquina local.

In [None]:
# Carregar um arquivo PDF
from google.colab import files
uploaded = files.upload()
pdf_path = list(uploaded.keys())[0]

Saving Meta-Reports-Fourth-Quarter-and-Full-Year-2024-Results-2025.pdf to Meta-Reports-Fourth-Quarter-and-Full-Year-2024-Results-2025.pdf


## **Etapa 7: Extrair e resumir o conteúdo do PDF**

Agora, extraímos o texto do PDF carregado e o resumimos usando IA.

In [None]:
extracted_text = extract_text_from_pdf(pdf_path)
summary = summarize_with_claude(extracted_text)


--- Summary of the PDF ---

Summary:
Meta Platforms, Inc. reported strong financial results for the fourth quarter and full year 2024, with revenue growth of 21% and 22% year-over-year, respectively. The company continues to make progress on AI, glasses, and the future of social media. Meta expects to invest heavily in infrastructure, employee compensation, and capital expenditures in 2025 to support its core business and generative AI efforts.

Three most important insights:
1. Meta's revenue growth remained strong, with 21% and 22% year-over-year increases for the fourth quarter and full year 2024, respectively, driven by increases in ad impressions and average price per ad.

2. The company plans to invest significantly in 2025, with total expenses expected to be in the range of $114-119 billion. The majority of the expense growth will be driven by infrastructure costs and employee compensation, particularly in the areas of infrastructure, monetization, Reality Labs, generative AI, 

In [None]:
import textwrap
wrapped_summary = textwrap.fill(summary, width=60)  # Quebrar texto em 60 caracteres por linha
print(wrapped_summary)

Summary: Meta Platforms, Inc. reported strong financial
results for the fourth quarter and full year 2024, with
revenue growth of 21% and 22% year-over-year, respectively.
The company continues to make progress on AI, glasses, and
the future of social media. Meta expects to invest heavily
in infrastructure, employee compensation, and capital
expenditures in 2025 to support its core business and
generative AI efforts.  Three most important insights: 1.
Meta's revenue growth remained strong, with 21% and 22%
year-over-year increases for the fourth quarter and full
year 2024, respectively, driven by increases in ad
impressions and average price per ad.  2. The company plans
to invest significantly in 2025, with total expenses
expected to be in the range of $114-119 billion. The
majority of the expense growth will be driven by
infrastructure costs and employee compensation, particularly
in the areas of infrastructure, monetization, Reality Labs,
generative AI, and regulation and compliance