<a href="https://colab.research.google.com/github/viniciusds2020/ia_langchain_agent_ml/blob/main/agente_ia_langchain.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Agente IA - Langchain - Desenvolvimento de um modelo de machine learning.
---

Agente IA - Langchain - Desenvolvimento de um modelo de machine learning aborda o processo de criação de um agente de inteligência artificial utilizando a framework Langchain. O texto detalha as etapas essenciais para o desenvolvimento de modelos de machine learning, desde a coleta e preparação de dados até a implementação e avaliação do modelo. Combinando técnicas avançadas de aprendizado de máquina e capacidades de multiagentes, o projeto visa criar soluções automatizadas e inteligentes, destacando a importância da integração entre diversas ferramentas e bibliotecas, como Crewai e langchain_community.

### Configuração do ambiente

In [None]:
%%writefile requirements.txt
pypdf
chromadb
langchain
langchain-groq
langchain_community
langchain_experimental
duckduckgo-search
sentence-transformers
youtube-search-python

Writing requirements.txt


In [None]:
!pip install --quiet -r requirements.txt

### Dependencias

In [None]:
from langchain.vectorstores import Chroma
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain.agents import load_tools
from langchain.agents import initialize_agent
from langchain.agents import AgentType
from langchain.agents import Tool
from langchain_groq import ChatGroq
from langchain.document_loaders import PyPDFLoader
from langchain_community.tools import DuckDuckGoSearchRun
from langchain_community.tools import YouTubeSearchTool
from langchain_experimental.utilities import PythonREPL
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain.memory import ConversationBufferMemory
from langchain_core.tools import tool
from youtubesearchpython import VideosSearch

### Modelo llm - via Groq

In [None]:
llm = ChatGroq(api_key='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',
               model='llama3-70b-8192',# Modelos disponiveis via groq 'llama3-8b-8192' 'mixtral-8x7b-32768' 'gemma-7b-it''llama3-70b-8192'
               temperature=0.7)

### Banco de dados de vetores

In [None]:
loader = PyPDFLoader("/content/drive/MyDrive/machinelearning-dg.pdf")
pages = loader.load_and_split()

embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2", #nomic-ai/nomic-embed-text-v1.5
                                   model_kwargs={'device': 'cpu'},
                                   encode_kwargs = {'normalize_embeddings': False})

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1024, chunk_overlap=64)
texts = text_splitter.split_documents(pages)

vectorstore = Chroma.from_documents(pages,embeddings)

### Ferramentas - Langchain

In [None]:
# Ferramenta de busca na internet:
search = DuckDuckGoSearchRun()
search_tool = Tool(
    name="DuckDuckGo_search",
    description="Search on duckduckgo. The tool can search a keyword on duckduckgo for the top papers. It will return publishing date, title, authors, and summary of the papers.",
    func=search.run
)

tools = [search_tool]

# Ferramenta de compilação de código em python:
python_repl = PythonREPL()
repl_tool = Tool(
    name="Python",
    description="A Python shell. Use this to execute python commands. Input should be a valid python command. If you want to see the output of a value, you should print it out with `print(...)`.",
    func=python_repl.run,
)

tools.append(repl_tool)

# Ferramenta de base de dados vetoriais Chromadb:
manual_tool = Tool(
        name="Manual de machine learning AWS",
        func=lambda q: str(vectorstore.search(q, search_type='similarity')),
        description="Uma ferramenta para pesquisar e recuperar informações relevantes do manual de machine learning AWS"
    )

tools.append(manual_tool)

### Agente IA

In [None]:
agent_chain = initialize_agent(
    tools,
    llm,
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    handle_parsing_errors=True,
    verbose=True,
)

### Prompt de análise

In [None]:
prompt: str = """
Entrada:

Resolva o problema de churn com machine learning. Primeiro, capture os dados do seguinte link do GitHub:
https://raw.githubusercontent.com/leki3n/Customer-churn-prediction-/main/churn-bigml-80.csv utilizando o comando pandas "pd.read_csv()".
No processamento dos dados, verifique o tipo de dados do dataset e o balanceamento da variável target(Churn).
Em seguida,crie um bloco de codigo em python com preprocessamento de acordo com o tipo de dadoe e tambem crie
um bloco que realize o balanceamento do dataset(utilizando o metodo do undersampling).
Selecione os modelos Regressão logistica, Random Forest, XGBoost do scikit-learn para resolver o problema de churn.
Escolha o melhor modelo utilizando a uma media harmonica entre as métricas da acurácia e F1.
Etapas e ferramentas:

- Pesquisa na Internet: Utilize as ferramentas de busca da internet para encontrar informações relevantes sobre o problema de churn
de machine learning e busque ideias de solução.

- Consulta do Manual de Machine learning da AWS: Utilize o manual PDF da AWS para obter informações específicas
sobre serviços e recursos da plataforma.

- Interação com o Python REPL: O agente utiliza o Python REPL para testar e validar o código que será gerado.

- Geração do Código Python: Com base nas informações coletadas, o agente gera um código Python que resolve o problema.

Saída:

- Gere um arquivo markdown em portugues de Brasil com codigo de todos os passo a passo e a saída do código Python e um resumo sobre a solução do problema.


"""

### Ação dos agentes

In [None]:
response = agent_chain.invoke(prompt)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to break down the problem into smaller tasks and use the available tools to solve each task. First, I need to capture the data from the GitHub link using pandas. Then, I need to process the data, check the data type, and balance the target variable (Churn). After that, I need to create a Python code block for preprocessing, another block for balancing the dataset, and select models from scikit-learn to solve the churn problem. Finally, I need to choose the best model using a harmonic mean between accuracy and F1 metrics.

Action: DuckDuckGo_search
Action Input: "machine learning churn problem"[0m
Observation: [36;1m[1;3mThanks to big data, forecasting customer churn with the help of machine learning is possible. Machine learning and data analysis are powerful ways to identify and predict churn. During churn prediction, you're also: Identifying at-risk customers, Identifying customer pain points, Identifying



[32;1m[1;3mI've got a good understanding of the problem and the tasks involved. Now, I need to capture the data from the GitHub link using pandas.

Action: Python
Action Input: `import pandas as pd; df = pd.read_csv('https://raw.githubusercontent.com/leki3n/Customer-churn-prediction-/main/churn-bigml-80.csv'); print(df.info())`[0m
Observation: [33;1m[1;3m<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2666 entries, 0 to 2665
Data columns (total 20 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   State                   2666 non-null   object 
 1   Account length          2666 non-null   int64  
 2   Area code               2666 non-null   int64  
 3   International plan      2666 non-null   object 
 4   Voice mail plan         2666 non-null   object 
 5   Number vmail messages   2666 non-null   int64  
 6   Total day minutes       2666 non-null   float64
 7   Total day calls         2666 non-null   int64  
 

In [None]:
display(response['output'])

"To solve the churn problem with machine learning, I followed these steps:\n\n1. Captured the data from the GitHub link using pandas.\n2. Checked the data type and balance of the target variable (Churn).\n3. Balanced the dataset using undersampling.\n4. Created a preprocessing block of code to handle categorical and numerical features.\n5. Selected three models from scikit-learn: Logistic Regression, Random Forest, and XGBoost.\n6. Trained and evaluated each model on the balanced dataset.\n7. Calculated the harmonic mean between accuracy and F1 metrics for each model.\n8. Chose the best model based on the harmonic mean.\n\nThe best model is XGBoost with a harmonic mean of 0.880.\n\nHere is the complete Markdown file in Portuguese:\n\n# Resolução do problema de churn com machine learning\n\n## Captura dos dados\n\nUtilizamos a biblioteca pandas para capturar os dados do link do GitHub.\n\n```python\nimport pandas as pd\ndf = pd.read_csv('https://raw.githubusercontent.com/leki3n/Customer

In [None]:
with open("output.md", "w") as f:
    f.write(f"# LangChain Agent Output\n\n{response['output']}")