Skip to content

[Bug]: llm_strategy not working #707

@ghost

Description

crawl4ai version

Version: 0.4.248

Expected Behavior

Expecting result.extracted_content based on my Pydantic model.

Current Behavior

Code is not reaching the OpenAI LLM successfully on Mac OS. I confirmed by using a Python script without crawl4ai code to openai LLM and execute and got a successful result.

Is this reproducible?

Yes

Inputs Causing the Bug

Steps to Reproduce

run code and program returns None in result.extracted_content

Code snippets

from pydantic import BaseModel, Field
from crawl4ai.extraction_strategy import LLMExtractionStrategy
from crawl4ai  import AsyncWebCrawler, CacheMode
from typing import List, Optional
import asyncio
import os
from dotenv import load_dotenv
import json
from crawl4ai.async_configs import BrowserConfig, CrawlerRunConfig

load_dotenv("/Users/stevenheadley/.zshrc")

# Define schema
class Listing(BaseModel):
    Title: str = Field(..., description="The title of the listing")
    Price: str = Field(..., description="The price of the listing")
    Bedrooms: str = Field(..., description="The number of bedrooms for the listing")
    Bathrooms: str = Field(..., description="The number of bathrooms for the listing")
    Area: str = Field(..., description="The square foortage of the listing")
    Description: str = Field(..., description="The description of the listing")
    Features: str = Field(..., description="The features of the listing")
    content: Optional[str] = Field(..., description="The completecontent of the listing.")

# Create strategy
strategy = LLMExtractionStrategy(
    provider="openai/gpt-4o",
    api_token=os.getenv("OPENAI_API_KEY"),
    schema=Listing.model_json_schema(),
    extraction_type='schema',
    instruction=f"Extract 'Title', 'Price', 'Bedrooms', 'Bathrooms', 'Area', 'Description', and 'Feature' from the content and return the list of listing.",
    verbose=True
)

async def main():
    browser_config = BrowserConfig(viewport_width=1280, viewport_height=720,browser_type="chromium", headless=True, verbose=True)  # or False to see the browser
    
    run_config = CrawlerRunConfig(
        cache_mode=CacheMode.BYPASS,
        remove_overlay_elements=True,
        exclude_external_links=True,
        exclude_social_media_links=True,
        excluded_tags=["nav", "header", "footer"],
        remove_forms=True,
        verbose=True,
        session_id='1234'
    )

   

    async with AsyncWebCrawler(config=browser_config) as crawler:
        result = await crawler.arun(
            url="https://barrierreefrealty.com/property/coconuts-caribe-phase-ii-2-bedroom/",
            config=run_config,
            verbose=True,
            extraction_strategy=strategy
        )
#        data = json.loads(result.extracted_content)
        print(result.extracted_content)  # Print clean markdown content

if __name__ == "__main__":
    asyncio.run(main())
    # Access extracted data
#    data = json.loads(result.extracted_content)

OS

MacOS

Python version

3.9.4

Browser

Chrome

Browser version

132.0.6834.83

Error logs & Screenshots (if applicable)

No errors, just empty results

Metadata

Metadata

Assignees

No one assigned

    Labels

    🐞 BugSomething isn't working🩺 Needs TriageNeeds attention of maintainers

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions