<a href="https://colab.research.google.com/github/ychoi-kr/llm-api-prog/blob/main/4_openai/openai_assistant_with_web_search.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install openai tavily-python

Collecting tavily-python
  Downloading tavily_python-0.5.0-py3-none-any.whl.metadata (11 kB)
Collecting tiktoken>=0.5.1 (from tavily-python)
  Downloading tiktoken-0.8.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.6 kB)
Downloading tavily_python-0.5.0-py3-none-any.whl (14 kB)
Downloading tiktoken-0.8.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m13.6 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: tiktoken, tavily-python
Successfully installed tavily-python-0.5.0 tiktoken-0.8.0


In [None]:
from google.colab import userdata
from openai import OpenAI
import time
import json
from tavily import TavilyClient

In [None]:
openai_api_key = userdata.get('OPENAI_API_KEY')
tavily_api_key = userdata.get('TAVILY_API_KEY')

In [None]:
assistant_instructions = """
You create a glossary entry in Korean on a given term.

Use the web_search tool for initial research to gather and verify information from credible sources. This ensures that definitions are informed by the most recent and reliable data.

If the tool does not return any information, abort with fail message.

Before including a URL, verify its validity and ensure it leads to the specific content being referenced. Avoid using generic homepage URLs unless they directly relate to the content. Never fabricate a fictional URL.

Instead of using honorifics (e.g. "입니다") in sentences, use haereahe (e.g. "이다") to maintain a direct and concise tone.

Follow output format below:
```
[Term]란 [comprehensive definition in 2-3 paragraphs].

### 참고

{% for each reference %}
- {%=reference in APA style. If the author and site name are not the same, write the author and site name separately.}
{% end for %}
```
"""

In [None]:
openai_client = OpenAI(api_key=openai_api_key)

In [None]:
tavily_client = TavilyClient(api_key=tavily_api_key)


In [None]:
def web_search(query):
    search_result = tavily_client.get_search_context(query, search_depth="advanced", max_tokens=8000)
    print(search_result)
    return search_result

In [None]:
web_search_json = {
    "name": "web_search",
    "description": "Get recent information from the web.",
    "parameters": {
        "type": "object",
        "properties": {
            "query": {"type": "string", "description": "The search query to use."},
        },
        "required": ["query"]
    }
}

In [None]:
assistant = openai_client.beta.assistants.create(
    name="Define it!",
    instructions=assistant_instructions,
    model="gpt-4o",
    tools=[{"type": "function", "function": web_search_json}],
)

In [None]:
thread = openai_client.beta.threads.create()

message = openai_client.beta.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content="Large Multimodal Models",
)

In [None]:
run = openai_client.beta.threads.runs.create(
    thread_id=thread.id,
    assistant_id=assistant.id,
)

In [None]:
while True:
    run = openai_client.beta.threads.runs.retrieve(
        thread_id=thread.id,
        run_id=run.id,
    )
    run_status = run.status

    if run_status == "requires_action" and run.required_action is not None:
        tools_to_call = run.required_action.submit_tool_outputs.tool_calls
        tool_output_array = []
        for tool in tools_to_call:
            tool_call_id = tool.id
            function_name = tool.function.name
            function_arg = json.loads(tool.function.arguments)
            if function_name == 'web_search':
                output = web_search(function_arg["query"])
            tool_output_array.append({"tool_call_id": tool_call_id, "output": output})

        run = openai_client.beta.threads.runs.submit_tool_outputs(
            thread_id=thread.id,
            run_id=run.id,
            tool_outputs=tool_output_array,
        )
    elif run_status in ["completed", "failed"]:
        break

    time.sleep(1)

"[\"{\\\"url\\\": \\\"https://arxiv.org/abs/2306.14895\\\", \\\"content\\\": \\\"This tutorial note summarizes the presentation on ``Large Multimodal Models: Towards Building and Surpassing Multimodal GPT-4'', a part of CVPR 2023 tutorial on ``Recent Advances in Vision Foundation Models''. The tutorial consists of three parts. We first introduce the background on recent GPT-like large models for vision-and-language modeling to motivate the research in instruction-tuned\\\"}\", \"{\\\"url\\\": \\\"https://aclanthology.org/2024.findings-acl.807/\\\", \\\"content\\\": \\\"The Revolution of Multimodal Large Language Models: A Survey - ACL Anthology 2024.findings-acl.807 Findings of the Association for Computational Linguistics ACL 2024 In Findings of the Association for Computational Linguistics ACL 2024, pages 13590\\\\u201313618, Bangkok, Thailand and virtual meeting. The Revolution of Multimodal Large Language Models: A Survey (Caffagni et al., Findings 2024) https://aclanthology.org/20

In [None]:
if run_status == 'completed':
    messages = openai_client.beta.threads.messages.list(
        thread_id=thread.id,
    )
    print(messages.data[0].content[0].text.value)
else:
    print(f"Run status: {run_status}")

Large Multimodal Models란 서로 다른 데이터 유형을 함께 처리하는 능력을 가진 대형 모델이다. 이러한 모델은 텍스트, 이미지, 오디오 등 다양한 입력 모달리티를 효율적으로 통합하여 보다 복합적인 이해를 가능하게 한다. 기존의 대형 언어 모델은 주로 텍스트 기반 작업에서 우수한 성능을 발휘하는 반면, 비텍스트 데이터의 처리에는 한계가 있었다. 여기서 멀티모달 모델은 여러 종류의 데이터를 결합하여 이러한 한계를 극복하고자 한다.

이러한 모델은 인공지능 연구에서 중요한 발전으로 간주되며, 텍스트와 이미지를 넘어서 오디오 및 비디오 데이터 입력까지 포함하는 방향으로 연구가 진행 중이다. 멀티모달 대형 언어 모델의 목표는 단순한 텍스트 출력을 넘어 비주얼 및 청각적 출력을 생성하는 것이다. 이와 같은 기술 발전은 인공지능의 일반적 이해 능력 향상을 위한 중요한 단계로 여겨지고 있다.

### 참고

- Caffagni et al. (2024). The Revolution of Multimodal Large Language Models: A Survey. ACL Anthology. Retrieved from https://aclanthology.org/2024.findings-acl.807
- AIMultiple. (n.d.). Large Multimodal Models vs Large Language Models. Retrieved from https://research.aimultiple.com/large-multimodal-models/
- Arxiv. (2023). Large Multimodal Models: Towards Building and Surpassing Multimodal GPT-4. Retrieved from https://arxiv.org/abs/2306.14895


In [None]:
openai_client.beta.assistants.delete(assistant.id)

AssistantDeleted(id='asst_PsEYtxtMfPlrq2aNpc77ZkVS', deleted=True, object='assistant.deleted')