Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The reply text rendered by the streaming chatbot page is incomplete #10731

Closed
1 task done
wuxianyess opened this issue Mar 5, 2025 · 8 comments
Closed
1 task done
Labels
bug Something isn't working needs repro Awaiting full reproduction

Comments

@wuxianyess
Copy link

Describe the bug

The reply text rendered by the streaming chatbot page is incomplete and missing from the content of response_message in yield response_message, state

Have you searched existing issues? 🔎

  • I have searched and found no existing issues

Reproduction

import gradio as gr

yield response_message, state

Screenshot

No response

Logs

System Info

Name: gradio
Version: 5.20.0

Severity

Blocking usage of gradio

@wuxianyess wuxianyess added the bug Something isn't working label Mar 5, 2025
@abidlabs
Copy link
Member

abidlabs commented Mar 5, 2025

What exactly is the issue? The description is not clear and there is no minimal code example that we can use to reproduce the issue. See: https://stackoverflow.com/help/minimal-reproducible-example

@abidlabs abidlabs added the needs repro Awaiting full reproduction label Mar 5, 2025
@wuxianyess
Copy link
Author

问题到底是什么?描述不清楚,也没有可用于重现该问题的最小代码示例。请参阅:https://stackoverflow.com/help/minimal-reproducible-example

I built a streaming chatbot web page with gradio5. The responses in the dialog box were incomplete compared to the responses printed by the code, and the web page generated responses that were partially generated and then swallowed back. The code is as follows

@abidlabs
Copy link
Member

abidlabs commented Mar 5, 2025

The code looks incomplete / not formatted correctly, can you please provide complete code?

@wuxianyess
Copy link
Author

问题到底是什么?描述不清楚,也没有可用于重现该问题的最小代码示例。请参阅:https://stackoverflow.com/help/minimal-reproducible-example

And when print("wait_time is :", wait_time) is executed, the web does not synchronize the streaming output reply

@wuxianyess
Copy link
Author

`def call_api(
user_message,
messages,
state,
):

msg = {"role": "user", "content": [{"type": "text", "text": user_message["text"]}]}
print("user_message[text]是:",user_message["text"])
if "files" in user_message and user_message["files"]:
    frames, duration, fps = read_frames_decord(user_message["files"][0], 256)

    frame_num = len(frames)
    video_duration = duration # 视频时长(秒)
    sample_fps = fps       # 每秒采样帧数
    sample_timestamp = [round(i / sample_fps, 2) for i in range(frame_num)]  # 生成每帧的时间戳
    temporal_instruction = (f"The time range of this video is [00:00-{seconds_to_minutes_seconds(video_duration)}], and the following is a series of {frame_num} frames sampled at {round(sample_fps,1)} FPS.\n")
    assert frame_num == len(sample_timestamp)
    special_tokens = '\n'.join([
        f'Frame{i + 1} is at [{seconds_to_minutes_secondswithdot(frame_time)}]:<IMAGE_TOKEN>' for i, frame_time in enumerate(sample_timestamp)
    ])

    pre_image_tokens = temporal_instruction + special_tokens + "\n"
    question  = pre_image_tokens + user_message["text"]
    msg["content"] = [{'type': 'text', 'text': question}]

    # Convert frames to base64
    frames_b64 = []
    for frame in frames:
        # Convert numpy array to PIL Image
        pil_image = Image.fromarray(frame)
        # Save image to bytes buffer
        buffer = io.BytesIO()
        # Save to buffer for normal processing
        pil_image.save(buffer, format="JPEG")
        # Convert to base64
        b64_str = base64.b64encode(buffer.getvalue()).decode("utf-8")
        frames_b64.append(b64_str)
        msg["content"].append(
            {"type": "image_url", "image_url": {"url":f"data:image/jpeg;base64,{b64_str}",'max_dynamic_patch': 0,'use_thumbnail':True,}}
        )
state.history.append(msg)

response = client.chat.completions.create(
    model=MODEL_NAME,
    messages=state.history,
    stream=True,
    # max_tokens=2048,
    max_tokens=8192, # think
    temperature=0.5,
    top_p=1.0,
    # frequency_penalty=1.05
    frequency_penalty=1.1
)

assistant_message = {"role": "assistant", "content": ""}
state.history.append(assistant_message)

has_think = False
response_message = []
generated_text = ""
wait_time = 0
ct = 6

for chunk in response:
    if chunk.choices[0].delta.content:
        wait_time += 1
        if wait_time < ct:
            generated_text += chunk.choices[0].delta.content
            if not response_message:
                response_message.append(gr.ChatMessage(role="assistant", content=""))
            response_message[-1].content += chunk.choices[0].delta.content                
        elif wait_time == ct:
            response_message = []
            if generated_text.startswith("<think>"):  # Start of assistant thinking
                has_think = True
                response_message.append(
                    gr.ChatMessage(
                        role="assistant",
                        content="",
                        metadata={"title": "⏳Thinking:"},
                    )
                )
                generated_text = generated_text.split("<think>")[1]
            else:
                response_message.append(gr.ChatMessage(role="assistant", content=""))
            response_message[-1].content += generated_text 
        else:
            if has_think and "</think>" in response_message[-1].content:
                parts = response_message[-1].content.split("</think>")
                response_message[-1].content = parts[0]
                response_message.append(gr.ChatMessage(role="assistant", content=""))
                if len(parts) > 1:
                    response_message[-1].content += parts[1]
                has_think = False
            else:
                response_message[-1].content += chunk.choices[0].delta.content

        print("wait_time是:", wait_time)
        # yield response_message, state
        # time.sleep(0.1)
        yield response_message, state

if not response_message and wait_time < time:
    response_message.append(gr.ChatMessage(role="assistant", content=generated_text))

state.history[-1]["content"] = response_message[-1].content  

print("1最终返回的 response_message[-1].content:", response_message[-1].content)
print("state.history[-1]是:", state.history[-1])
print("2最终返回的 response_message[-1].content:", response_message[-1].content)

# yield response_message, state
# time.sleep(0.1)
yield response_message, state

@DataClass
class TokenUsage:
user: int = 0
assistant: int = 0
total: int = 0

@DataClass
class TokenStat:
used: list[TokenUsage] = field(default_factory=list)
used_count: int = 0

@DataClass
class StateData:
history: list = field(default_factory=list)
token_stat: TokenStat = TokenStat
# token_stat: TokenStat = field(default_factory=TokenStat)

state = gr.State(StateData())
demo = gr.ChatInterface(
call_api,
type="messages",
multimodal=True,
textbox=gr.MultimodalTextbox(file_types=["video"], file_count="single"),
flagging_mode="manual",
flagging_options=["👍点赞", "🗑️垃圾信息", "👎不合适", "🤷其它"],
save_history=True,
additional_inputs=state,
additional_inputs_accordion=gr.Accordion(visible=False),
additional_outputs=state,
)

if name == "main":
demo.queue().launch(
# share=False,
share=True,
inbrowser=False,
server_port=8023,
server_name="0.0.0.0",
)`

@wuxianyess
Copy link
Author

The code looks incomplete / not formatted correctly, can you please provide complete code?

That's all I can offer

@wuxianyess
Copy link
Author

The code looks incomplete / not formatted correctly, can you please provide complete code?

`def call_api( user_message, messages, state, ):

msg = {"role": "user", "content": [{"type": "text", "text": user_message["text"]}]}
print("user_message[text]是:",user_message["text"])
if "files" in user_message and user_message["files"]:
    frames, duration, fps = read_frames_decord(user_message["files"][0], 256)

    frame_num = len(frames)
    video_duration = duration # 视频时长(秒)
    sample_fps = fps       # 每秒采样帧数
    sample_timestamp = [round(i / sample_fps, 2) for i in range(frame_num)]  # 生成每帧的时间戳
    temporal_instruction = (f"The time range of this video is [00:00-{seconds_to_minutes_seconds(video_duration)}], and the following is a series of {frame_num} frames sampled at {round(sample_fps,1)} FPS.\n")
    assert frame_num == len(sample_timestamp)
    special_tokens = '\n'.join([
        f'Frame{i + 1} is at [{seconds_to_minutes_secondswithdot(frame_time)}]:<IMAGE_TOKEN>' for i, frame_time in enumerate(sample_timestamp)
    ])

    pre_image_tokens = temporal_instruction + special_tokens + "\n"
    question  = pre_image_tokens + user_message["text"]
    msg["content"] = [{'type': 'text', 'text': question}]

    # Convert frames to base64
    frames_b64 = []
    for frame in frames:
        # Convert numpy array to PIL Image
        pil_image = Image.fromarray(frame)
        # Save image to bytes buffer
        buffer = io.BytesIO()
        # Save to buffer for normal processing
        pil_image.save(buffer, format="JPEG")
        # Convert to base64
        b64_str = base64.b64encode(buffer.getvalue()).decode("utf-8")
        frames_b64.append(b64_str)
        msg["content"].append(
            {"type": "image_url", "image_url": {"url":f"data:image/jpeg;base64,{b64_str}",'max_dynamic_patch': 0,'use_thumbnail':True,}}
        )
state.history.append(msg)

response = client.chat.completions.create(
    model=MODEL_NAME,
    messages=state.history,
    stream=True,
    # max_tokens=2048,
    max_tokens=8192, # think
    temperature=0.5,
    top_p=1.0,
    # frequency_penalty=1.05
    frequency_penalty=1.1
)

assistant_message = {"role": "assistant", "content": ""}
state.history.append(assistant_message)

has_think = False
response_message = []
generated_text = ""
wait_time = 0
ct = 6

for chunk in response:
    if chunk.choices[0].delta.content:
        wait_time += 1
        if wait_time < ct:
            generated_text += chunk.choices[0].delta.content
            if not response_message:
                response_message.append(gr.ChatMessage(role="assistant", content=""))
            response_message[-1].content += chunk.choices[0].delta.content                
        elif wait_time == ct:
            response_message = []
            if generated_text.startswith("<think>"):  # Start of assistant thinking
                has_think = True
                response_message.append(
                    gr.ChatMessage(
                        role="assistant",
                        content="",
                        metadata={"title": "⏳Thinking:"},
                    )
                )
                generated_text = generated_text.split("<think>")[1]
            else:
                response_message.append(gr.ChatMessage(role="assistant", content=""))
            response_message[-1].content += generated_text 
        else:
            if has_think and "</think>" in response_message[-1].content:
                parts = response_message[-1].content.split("</think>")
                response_message[-1].content = parts[0]
                response_message.append(gr.ChatMessage(role="assistant", content=""))
                if len(parts) > 1:
                    response_message[-1].content += parts[1]
                has_think = False
            else:
                response_message[-1].content += chunk.choices[0].delta.content

        print("wait_time是:", wait_time)
        # yield response_message, state
        # time.sleep(0.1)
        yield response_message, state

if not response_message and wait_time < time:
    response_message.append(gr.ChatMessage(role="assistant", content=generated_text))

state.history[-1]["content"] = response_message[-1].content  

print("1最终返回的 response_message[-1].content:", response_message[-1].content)
print("state.history[-1]是:", state.history[-1])
print("2最终返回的 response_message[-1].content:", response_message[-1].content)

# yield response_message, state
# time.sleep(0.1)
yield response_message, state

@DataClass class TokenUsage: user: int = 0 assistant: int = 0 total: int = 0

@DataClass class TokenStat: used: list[TokenUsage] = field(default_factory=list) used_count: int = 0

@DataClass class StateData: history: list = field(default_factory=list) token_stat: TokenStat = TokenStat # token_stat: TokenStat = field(default_factory=TokenStat)

state = gr.State(StateData()) demo = gr.ChatInterface( call_api, type="messages", multimodal=True, textbox=gr.MultimodalTextbox(file_types=["video"], file_count="single"), flagging_mode="manual", flagging_options=["👍点赞", "🗑️垃圾信息", "👎不合适", "🤷其它"], save_history=True, additional_inputs=state, additional_inputs_accordion=gr.Accordion(visible=False), additional_outputs=state, )

if name == "main": demo.queue().launch( # share=False, share=True, inbrowser=False, server_port=8023, server_name="0.0.0.0", )`

hello, do you have time to take a look?

@abidlabs
Copy link
Member

abidlabs commented Mar 7, 2025

Hi this is not a properly formatted or complete reproduction, I'll close this issue

@abidlabs abidlabs closed this as not planned Won't fix, can't repro, duplicate, stale Mar 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs repro Awaiting full reproduction
Projects
None yet
Development

No branches or pull requests

2 participants