crawl4ai version
Version: 0.4.248
Expected Behavior
Expecting result.extracted_content based on my Pydantic model.
Current Behavior
Code is not reaching the OpenAI LLM successfully on Mac OS. I confirmed by using a Python script without crawl4ai code to openai LLM and execute and got a successful result.
Is this reproducible?
Yes
Inputs Causing the Bug
Steps to Reproduce
run code and program returns None in result.extracted_content
Code snippets
from pydantic import BaseModel, Field
from crawl4ai.extraction_strategy import LLMExtractionStrategy
from crawl4ai import AsyncWebCrawler, CacheMode
from typing import List, Optional
import asyncio
import os
from dotenv import load_dotenv
import json
from crawl4ai.async_configs import BrowserConfig, CrawlerRunConfig
load_dotenv("/Users/stevenheadley/.zshrc")
# Define schema
class Listing(BaseModel):
Title: str = Field(..., description="The title of the listing")
Price: str = Field(..., description="The price of the listing")
Bedrooms: str = Field(..., description="The number of bedrooms for the listing")
Bathrooms: str = Field(..., description="The number of bathrooms for the listing")
Area: str = Field(..., description="The square foortage of the listing")
Description: str = Field(..., description="The description of the listing")
Features: str = Field(..., description="The features of the listing")
content: Optional[str] = Field(..., description="The completecontent of the listing.")
# Create strategy
strategy = LLMExtractionStrategy(
provider="openai/gpt-4o",
api_token=os.getenv("OPENAI_API_KEY"),
schema=Listing.model_json_schema(),
extraction_type='schema',
instruction=f"Extract 'Title', 'Price', 'Bedrooms', 'Bathrooms', 'Area', 'Description', and 'Feature' from the content and return the list of listing.",
verbose=True
)
async def main():
browser_config = BrowserConfig(viewport_width=1280, viewport_height=720,browser_type="chromium", headless=True, verbose=True) # or False to see the browser
run_config = CrawlerRunConfig(
cache_mode=CacheMode.BYPASS,
remove_overlay_elements=True,
exclude_external_links=True,
exclude_social_media_links=True,
excluded_tags=["nav", "header", "footer"],
remove_forms=True,
verbose=True,
session_id='1234'
)
async with AsyncWebCrawler(config=browser_config) as crawler:
result = await crawler.arun(
url="https://barrierreefrealty.com/property/coconuts-caribe-phase-ii-2-bedroom/",
config=run_config,
verbose=True,
extraction_strategy=strategy
)
# data = json.loads(result.extracted_content)
print(result.extracted_content) # Print clean markdown content
if __name__ == "__main__":
asyncio.run(main())
# Access extracted data
# data = json.loads(result.extracted_content)
OS
MacOS
Python version
3.9.4
Browser
Chrome
Browser version
132.0.6834.83
Error logs & Screenshots (if applicable)
No errors, just empty results
crawl4ai version
Version: 0.4.248
Expected Behavior
Expecting result.extracted_content based on my Pydantic model.
Current Behavior
Code is not reaching the OpenAI LLM successfully on Mac OS. I confirmed by using a Python script without crawl4ai code to openai LLM and execute and got a successful result.
Is this reproducible?
Yes
Inputs Causing the Bug
Steps to Reproduce
run code and program returns None in result.extracted_contentCode snippets
OS
MacOS
Python version
3.9.4
Browser
Chrome
Browser version
132.0.6834.83
Error logs & Screenshots (if applicable)
No errors, just empty results