In [None]:
import os

from openai import AsyncOpenAI

from agents import Agent, OpenAIChatCompletionsModel, Runner, set_tracing_disabled
from dotenv import load_dotenv

load_dotenv()


GLM_API_KEY = os.getenv("GLM_API_KEY")
GLM_API_BASE = os.getenv("GLM_API_BASE")
GLM_MODEL_NAME = os.getenv("GLM_MODEL_NAME")

client = AsyncOpenAI(
    api_key=GLM_API_KEY,
    base_url=GLM_API_BASE,
)
set_tracing_disabled(disabled=True)


In [None]:

URL = "https://i-blog.csdnimg.cn/direct/2c2d692370ea470d975f50e0ab13fd46.png"


async def main():
    agent = Agent(
        name="Assistant",
        instructions="You are a helpful assistant.",
        model=OpenAIChatCompletionsModel(
            model=GLM_MODEL_NAME,
            openai_client=client,
        )
    )

    result = await Runner.run(
        agent,
        [
            {
                "role": "user",
                "content": [{"type": "input_image", "image_url": URL}],
            },
            {
                "role": "user",
                "content": "What do you see in this image?",
            },
        ],
    )
    print(result.final_output)


await main()


The image shows a terminal interface related to installing the Google Gemini CLI tool. Here’s what’s visible:

1. **Terminal Commands & Output**:  
   - A command is run (`npx https://github.com/google-gemini/gemini-cli`) under the directory `/mnt/d/agent-llm/gemini_cli_test`.  
   - Text indicating the package to install (`github:google-gemini/gemini-cli`) and a confirmation prompt “Ok to proceed? (y) y” (where “y” confirms proceeding).  

2. **Visual Element**:  
   - A large, pixelated text “> GEMINI” with letters in gradient colors (blue, purple, pink) centered below the terminal output.  

3. **Tips Section**:  
   - A list titled “Tips for getting started”:  
     1. Ask questions, edit files, or run commands.  
     2. Be specific for the best results.  
     3. Create `GEMINI.md` files to customize interactions with Gemini.  
     4. Use `/help` for more information.  

4. **Attribution**:  
   - In the bottom-right corner, it says “CSDN @ZHOU_CAMP”, indicating the source or au

In [5]:

URL = "https://cdn.bigmodel.cn/markdown/1751371876183image.png?attname=image.png"


async def main():
    agent = Agent(
        name="Assistant",
        instructions="You are a helpful assistant.",
        model=OpenAIChatCompletionsModel(
            model=GLM_MODEL_NAME,
            openai_client=client,
        )
    )

    result = await Runner.run(
        agent,
        [
            {
                "role": "user",
                "content": [{"type": "input_image", "image_url": URL}],
            },
            {
                "role": "user",
                "content": "What do you see in this image?",
            },
        ],
    )
    print(result.final_output)


await main()


The image shows a **comparative performance table** of various large language and multimodal models across multiple benchmark tests. The table is structured as follows:

- **Rows**: Different categories of benchmarks (e.g., General VQA, STEM, OCR & Chart, Long Document, Visual Grounding, GUI Agents, Coding, Video Understanding) and specific benchmark names within each category.
- **Columns**: Different models (e.g., GLM - 4.1V - Thinking - 9B, Qwen2.5 - VL 7B, InternVL3 9B, Kimi - VL A3B - Thinking, MiMo - VL 7B - RL, Qwen2.5 - VL 72B, GPT - 40 2024 - 11 - 20).
- **Cells**: Numerical performance scores (likely accuracy, F1, or other metrics) for each model on each benchmark. Some cells have symbols (* , †) which may indicate statistical significance or special notes.

In summary, the image presents a detailed comparison of how these AI models perform across diverse tasks (like visual question answering, STEM problems, coding, video understanding, etc.).
