Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Self-Host] /crawl endpoint failing with ollama / llama3.2:1b #1294

Open
anuragphadke opened this issue Mar 4, 2025 · 0 comments
Open

[Self-Host] /crawl endpoint failing with ollama / llama3.2:1b #1294

anuragphadke opened this issue Mar 4, 2025 · 0 comments

Comments

@anuragphadke
Copy link

anuragphadke commented Mar 4, 2025

Describe the Issue
I am running firecrawl using docker build instructions. /scrape end-point runs fine without any issues.

curl --location 'http://localhost:3002/v1/extract' \
--header 'Content-Type: application/json' \
--data '{
    "urls": [
        "https://news.ycombinator.com/"
    ],
    "prompt": "find top stories in technology sector."
}'

returns:

Error parsing schema analysis {"error":{"name":"AI_NoObjectGeneratedError","cause":{"name":"AI_TypeValidationError","cause":{"issues":[{"code":"invalid_type","expected":"boolean","received":"undefined","path":["isMultiEntity"],"message":"Required"},{"code":"invalid_type","expected":"string","received":"undefined","path":["reasoning"],"message":"Required"},{"code":"invalid_type","expected":"array","received":"undefined","path":["keyIndicators"],"message":"Required"}],"name":"ZodError","message":"[\n  {\n    \"code\": \"invalid_type\",\n    \"expected\": \"boolean\",\n    \"received\": \"undefined\",\n    \"path\": [\n      \"isMultiEntity\"\n    ],\n    \"message\": \"Required\"\n  },\n  {\n    \"code\": \"invalid_type\",\n    \"expected\": \"string\",\n    \"received\": \"undefined\",\n    \"path\": [\n      \"reasoning\"\n    ],\n    \"message\": \"Required\"\n  },\n  {\n    \"code\": \"invalid_type\",\n    \"expected\": \"array\",\n    \"received\": \"undefined\",\n    \"path\": [\n      \"keyIndicators\"\n    ],\n    \"message\": \"Required\"\n  }\n]","stack":"ZodError: [\n  {\n    \"code\": \"invalid_type\",\n    \"expected\": \"boolean\",\n    \"received\": \"undefined\",\n    \"path\": [\n      \"isMultiEntity\"\n    ],\n    \"message\": \"Required\"\n  },\n  {\n    \"code\": \"invalid_type\",\n    \"expected\": \"string\",\n    \"received\": \"undefined\",\n    \"path\": [\n      \"reasoning\"\n    ],\n    \"message\": \"Required\"\n  },\n  {\n    \"code\": \"invalid_type\",\n    \"expected\": \"array\",\n    \"received\": \"undefined\",\n    \"path\": [\n      \"keyIndicators\"\n    ],\n    \"message\": \"Required\"\n  }\n]\n    at get error (/app/node_modules/.pnpm/zod@3.24.2/node_modules/zod/lib/types.js:55:31)\n    at Object.validate (/app/node_modules/.pnpm/@ai-sdk+ui-utils@1.1.15_zod@3.24.2/node_modules/@ai-sdk/ui-utils/dist/index.js:1527:105)\n    at safeValidateTypes (/app/node_modules/.pnpm/@ai-sdk+provider-utils@2.1.9_zod@3.24.2/node_modules/@ai-sdk/provider-utils/dist/index.js:391:31)\n    at Object.validateFinalResult (/app/node_modules/.pnpm/ai@4.1.45_react@18.3.1_zod@3.24.2/node_modules/ai/dist/index.js:2115:57)\n    at processResult (/app/node_modules/.pnpm/ai@4.1.45_react@18.3.1_zod@3.24.2/node_modules/ai/dist/index.js:2756:49)\n    at fn (/app/node_modules/.pnpm/ai@4.1.45_react@18.3.1_zod@3.24.2/node_modules/ai/dist/index.js:2787:21)\n    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)\n    at async /app/node_modules/.pnpm/ai@4.1.45_react@18.3.1_zod@3.24.2/node_modules/ai/dist/index.js:548:22\n    at async generateCompletions (/app/dist/src/scraper/scrapeURL/transformers/llmExtract.js:233:24)\n    at async analyzeSchemaAndPrompt (/app/dist/src/lib/extract/completions/analyzeSchemaAndPrompt.js:24:49)\n    at async performExtraction (/app/dist/src/lib/extract/extraction-service.js:151:113)\n    at async processExtractJobInternal (/app/dist/src/services/queue-worker.js:249:24)"},"value":{"$schema":"http://json-schema.org/draft-07/schema#","title":"Top Stories in Technology Sector","type":"object","properties":{"results":{"type":"array","items":{"type":"object","properties":{"title":{"type":"string"},"author":{"type":"string"},"url":{"type":"string"},"publishedAt":{"type":"string","format":"date-time"}},"required":["title","author","url","publishedAt"]}},"pagination":{"type":"object","properties":{"totalItems":{"type":"integer"},"currentPage":{"type":"integer","minimum":1}},"required":["totalItems","currentPage"]},"paginationMeta":{"type":"object","properties":{"currentPage":{"type":"integer"}},"required":["currentPage"]},"paginationMetaTotalItems":{"type":"integer","minimum":0},"paginationMetaPageNumber":{"type":"integer","minimum":1}},"definitions":{"result":{"type":"object"}}},"message":"Type validation failed: Value: {\"$schema\":\"http://json-schema.org/draft-07/schema#\",\"title\":\"Top Stories in Technology Sector\",\"type\":\"object\",\"properties\":{\"results\":{\"type\":\"array\",\"items\":{\"type\":\"object\",\"properties\":{\"title\":{\"type\":\"string\"},\"author\":{\"type\":\"string\"},\"url\":{\"type\":\"string\"},\"publishedAt\":{\"type\":\"string\",\"format\":\"date-time\"}},\"required\":[\"title\",\"author\",\"url\",\"publishedAt\"]}},\"pagination\":{\"type\":\"object\",\"properties\":{\"totalItems\":{\"type\":\"integer\"},\"currentPage\":{\"type\":\"integer\",\"minimum\":1}},\"required\":[\"totalItems\",\"currentPage\"]},\"paginationMeta\":{\"type\":\"object\",\"properties\":{\"currentPage\":{\"type\":\"integer\"}},\"required\":[\"currentPage\"]},\"paginationMetaTotalItems\":{\"type\":\"integer\",\"minimum\":0},\"paginationMetaPageNumber\":{\"type\":\"integer\",\"minimum\":1}},\"definitions\":{\"result\":{\"type\":\"object\"}}}.\nError message: [\n  {\n    \"code\": \"invalid_type\",\n    \"expected\": \"boolean\",\n    \"received\": \"undefined\",\n    \"path\": [\n      \"isMultiEntity\"\n    ],\n    \"message\": \"Required\"\n  },\n  {\n    \"code\": \"invalid_type\",\n    \"expected\": \"string\",\n    \"received\": \"undefined\",\n    \"path\": [\n      \"reasoning\"\n    ],\n    \"message\": \"Required\"\n  },\n  {\n    \"code\": \"invalid_type\",\n    \"expected\": \"array\",\n    \"received\": \"undefined\",\n    \"path\": [\n      \"keyIndicators\"\n    ],\n    \"message\": \"Required\"\n  }\n]","stack":"AI_TypeValidationError: Type validation failed: Value: {\"$schema\":\"http://json-schema.org/draft-07/schema#\",\"title\":\"Top Stories in Technology Sector\",\"type\":\"object\",\"properties\":{\"results\":{\"type\":\"array\",\"items\":{\"type\":\"object\",\"properties\":{\"title\":{\"type\":\"string\"},\"author\":{\"type\":\"string\"},\"url\":{\"type\":\"string\"},\"publishedAt\":{\"type\":\"string\",\"format\":\"date-time\"}},\"required\":[\"title\",\"author\",\"url\",\"publishedAt\"]}},\"pagination\":{\"type\":\"object\",\"properties\":{\"totalItems\":{\"type\":\"integer\"},\"currentPage\":{\"type\":\"integer\",\"minimum\":1}},\"required\":[\"totalItems\",\"currentPage\"]},\"paginationMeta\":{\"type\":\"object\",\"properties\":{\"currentPage\":{\"type\":\"integer\"}},\"required\":[\"currentPage\"]},\"paginationMetaTotalItems\":{\"type\":\"integer\",\"minimum\":0},\"paginationMetaPageNumber\":{\"type\":\"integer\",\"minimum\":1}},\"definitions\":{\"result\":{\"type\":\"object\"}}}.\nError message: [\n  {\n    \"code\": \"invalid_type\",\n    \"expected\": \"boolean\",\n    \"received\": \"undefined\",\n    \"path\": [\n      \"isMultiEntity\"\n    ],\n    \"message\": \"Required\"\n  },\n  {\n    \"code\": \"invalid_type\",\n    \"expected\": \"string\",\n    \"received\": \"undefined\",\n    \"path\": [\n      \"reasoning\"\n    ],\n    \"message\": \"Required\"\n  },\n  {\n    \"code\": \"invalid_type\",\n    \"expected\": \"array\",\n    \"received\": \"undefined\",\n    \"path\": [\n      \"keyIndicators\"\n    ],\n    \"message\": \"Required\"\n  }\n]\n    at _TypeValidationError.wrap (/app/node_modules/.pnpm/@ai-sdk+provider@1.0.8/node_modules/@ai-sdk/provider/dist/index.js:367:86)\n    at safeValidateTypes (/app/node_modules/.pnpm/@ai-sdk+provider-utils@2.1.9_zod@3.24.2/node_modules/@ai-sdk/provider-utils/dist/index.js:397:51)\n    at Object.validateFinalResult (/app/node_modules/.pnpm/ai@4.1.45_react@18.3.1_zod@3.24.2/node_modules/ai/dist/index.js:2115:57)\n    at processResult (/app/node_modules/.pnpm/ai@4.1.45_react@18.3.1_zod@3.24.2/node_modules/ai/dist/index.js:2756:49)\n    at fn (/app/node_modules/.pnpm/ai@4.1.45_react@18.3.1_zod@3.24.2/node_modules/ai/dist/index.js:2787:21)\n    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)\n    at async /app/node_modules/.pnpm/ai@4.1.45_react@18.3.1_zod@3.24.2/node_modules/ai/dist/index.js:548:22\n    at async generateCompletions (/app/dist/src/scraper/scrapeURL/transformers/llmExtract.js:233:24)\n    at async analyzeSchemaAndPrompt (/app/dist/src/lib/extract/completions/analyzeSchemaAndPrompt.js:24:49)\n    at async performExtraction (/app/dist/src/lib/extract/extraction-service.js:151:113)\n    at async processExtractJobInternal (/app/dist/src/services/queue-worker.js:249:24)"},"text":"{\n  \"$schema\": \"http://json-schema.org/draft-07/schema#\",\n  \"title\": \"Top Stories in Technology Sector\",\n  \"type\": \"object\",\n  \"properties\": {\n    \"results\": {\n      \"type\": \"array\",\n      \"items\": {\n        \"type\": \"object\",\n        \"properties\": {\n          \"title\": {\n            \"type\": \"string\"\n          },\n          \"author\": {\n            \"type\": \"string\"\n          },\n          \"url\": {\n            \"type\": \"string\"\n          },\n          \"publishedAt\": {\n            \"type\": \"string\",\n            \"format\": \"date-time\"\n          }\n        },\n        \"required\": [\"title\", \"author\", \"url\", \"publishedAt\"]\n      }\n    },\n    \"pagination\": {\n      \"type\": \"object\",\n      \"properties\": {\n        \"totalItems\": {\n          \"type\": \"integer\"\n        },\n        \"currentPage\": {\n          \"type\": \"integer\",\n          \"minimum\": 1\n        }\n      },\n      \"required\": [\"totalItems\", \"currentPage\"]\n    },\n    \"paginationMeta\": {\n      \"type\": \"object\",\n      \"properties\": {\n        \"currentPage\": {\n          \"type\": \"integer\"\n        }\n      },\n      \"required\": [\"currentPage\"]\n    },\n    \"paginationMetaTotalItems\": {\n      \"type\": \"integer\",\n      \"minimum\": 0\n    },\n    \"paginationMetaPageNumber\": {\n      \"type\": \"integer\",\n      \"minimum\": 1\n    }\n  },\n  \"definitions\": {\n    \"result\": {\n      \"type\": \"object\"\n    }\n  }\n}","response":{"id":"aiobj-UmVovydj3N7EmdPS8Wl7bOLc","timestamp":"2025-03-05T01:47:39.766Z","modelId":"llama3.2:1b"},"usage":{"promptTokens":788,"completionTokens":601,"totalTokens":1389},"message":"No object generated: response did not match schema.","stack":"AI_NoObjectGeneratedError: No object generated: response did not match schema.\n    at processResult (/app/node_modules/.pnpm/ai@4.1.45_react@18.3.1_zod@3.24.2/node_modules/ai/dist/index.js:2765:17)\n    at fn (/app/node_modules/.pnpm/ai@4.1.45_react@18.3.1_zod@3.24.2/node_modules/ai/dist/index.js:2787:21)\n    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)\n    at async /app/node_modules/.pnpm/ai@4.1.45_react@18.3.1_zod@3.24.2/node_modules/ai/dist/index.js:548:22\n    at async generateCompletions (/app/dist/src/scraper/scrapeURL/transformers/llmExtract.js:233:24)\n    at async analyzeSchemaAndPrompt (/app/dist/src/lib/extract/completions/analyzeSchemaAndPrompt.js:24:49)\n    at async performExtraction (/app/dist/src/lib/extract/extraction-service.js:151:113)\n    at async processExtractJobInternal (/app/dist/src/services/queue-worker.js:249:24)"}}
worker-1              | 2025-03-05 01:47:46 info [queue-worker:processJob]: 🐂 Worker taking job 3a522972-796a-465a-ab6e-9c2357caef45
worker-1              | 2025-03-05 01:47:46 info [ScrapeURL:]: Scraping URL "https://news.ycombinator.com/"...
worker-1              | 2025-03-05 01:47:46 info [ScrapeURL:]: Scraping via playwright...
worker-1              | 2025-03-05 01:47:46 warn [ScrapeURL:]: An unexpected error happened while scraping with playwright. {"module":"ScrapeURL","scrapeId":"3a522972-796a-465a-ab6e-9c2357caef45","scrapeURL":"https://news.ycombinator.com/","error":{"name":"Error","message":"Request sent failure status","stack":"Error: Request sent failure status\n    at robustFetch (/app/dist/src/scraper/scrapeURL/lib/fetch.js:162:19)\n    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)\n    at async scrapeURLWithPlaywright (/app/dist/src/scraper/scrapeURL/engines/playwright/index.js:10:9)\n    at async scrapeURLWithEngine (/app/dist/src/scraper/scrapeURL/engines/index.js:322:12)\n    at async scrapeURLLoop (/app/dist/src/scraper/scrapeURL/index.js:134:35)\n    at async scrapeURL (/app/dist/src/scraper/scrapeURL/index.js:274:24)\n    at async runWebScraper (/app/dist/src/main/runWebScraper.js:66:24)\n    at async startWebScraperPipeline (/app/dist/src/main/runWebScraper.js:11:12)\n    at async processJob (/app/dist/src/services/queue-worker.js:727:26)\n    at async processJobInternal (/app/dist/src/services/queue-worker.js:203:28)","cause":{"params":{"url":"http://playwright-service:3000/html","logger":{},"method":"POST","body":{"url":"https://news.ycombinator.com/","wait_after_load":0,"timeout":300000},"headers":{"Content-Type":"application/json"},"schema":{"_def":{"unknownKeys":"strip","catchall":{"_def":{"typeName":"ZodNever"},"~standard":{"version":1,"vendor":"zod"}},"typeName":"ZodObject"},"~standard":{"version":1,"vendor":"zod"},"_cached":null},"ignoreResponse":false,"ignoreFailure":false,"tryCount":1},"response":{"status":404,"headers":{},"body":"<!DOCTYPE html>\n<html lang=\"en\">\n<head>\n<meta charset=\"utf-8\">\n<title>Error</title>\n</head>\n<body>\n<pre>Cannot POST /html</pre>\n</body>\n</html>\n"},"requestId":"699e1600-39f3-4fca-8bc7-a075efd0757b"}}}

To Reproduce
Steps to reproduce the issue:

  1. .env file snippet
OPENAI_BASE_URL=http://localhost:11434/api
OLLAMA_BASE_URL=http://localhost:11434/api
MODEL_NAME=llama3.2:1b

Tried different variations by setting OPENAI_BASE_URL as http://localhost:11434

OPENAI_BASE_URL=http://localhost:11434/
OLLAMA_BASE_URL=http://localhost:11434/
MODEL_NAME=llama3.2:1b
  1. docker compose build
  2. docker compose up

Expected Behavior
/crawl endpoint should work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant