Skip to content

[Bug]: Phi-3-Small model reporting AttributeError: 'NoneType' object has no attribute 'prefill_metadata' #19665

@dnaihao

Description

@dnaihao

Your current environment

I am using VLLM 0.9.1, and the flash_attention installed by
MAX_JOBS=4 pip install flash-attn --no-build-isolation

🐛 Describe the bug

When I try to run Phi-3 Small models (both 8k and 128k), it would throw an error of

AttributeError: 'NoneType' object has no attribute 'prefill_metadata'

specifically in vllm/attention/backends/blocksparse_attn.py line 416, where the code tries to access prefill_metadata by calling if prefill_meta := attn_metadata.prefill_metadata:.

I tried to trace the error myself, it went all the way in the phi3_small.py where the model calls the self.attn in the pass, but I cannot find where the model fills the attn_metadata, I saw from other posts that the model is using the context manager to handle this, which I am not quite familiar with.

Also, other models such as Llama would not have such an issue, I suspect this to be related to the attention type, as for phi 3 small, it would run the torch.ops.vllm.unified_attention in vllm/attention/layer.py, but Llama model would not touch such a line.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions