Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker部署下怎么运行wxbot爬取公众号信息啊 #293

Closed
WuYu-sky opened this issue Mar 5, 2025 · 10 comments
Closed

docker部署下怎么运行wxbot爬取公众号信息啊 #293

WuYu-sky opened this issue Mar 5, 2025 · 10 comments

Comments

@WuYu-sky
Copy link

WuYu-sky commented Mar 5, 2025

No description provided.

@bigbrother666sh
Copy link
Member

这需要整合 wxbot 的 docker,出一个 compose, 欢迎贡献 PR

@leoxu2024
Copy link

INIT].... → Crawl4AI 0.5.0.post2
2025-03-09 05:16:22.423 | DEBUG | general_process:main_process:147 - process new url, still 0 urls in working list
[ERROR]... × https://mp.weixin.qq.com/s?__biz=MzUxNjg4NDEzNA==&... | Error:
┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ × Unexpected error in _crawl_web at line 579 in _crawl_web (../../../.local/lib/python3.10/site- │
│ packages/crawl4ai/async_crawler_strategy.py): │
│ Error: Failed on navigating ACS-GOTO: │
│ Page.goto: net::ERR_NETWORK_CHANGED at │
https://mp.weixin.qq.com/s?__biz=MzUxNjg4NDEzNA==&mid=2247522395&idx=1&sn=0a3aabc3bb6b5cdb8fe8fe313225fd4b
│ Call log: │
│ - navigating to │
│ "https://mp.weixin.qq.com/s?__biz=MzUxNjg4NDEzNA==&mid=2247522395&idx=1&sn=0a3aabc3bb6b5cdb8fe8fe313225fd4b", │
│ waiting until "commit" │
│ │
│ │
│ Code context: │
│ 574 response = await page.goto( │
│ 575 url, wait_until=config.wait_until, timeout=config.page_timeout │
│ 576 ) │
│ 577 redirected_url = page.url │
│ 578 except Error as e: │
│ 579 → raise RuntimeError(f"Failed on navigating ACS-GOTO:\n{str(e)}") │
│ 580 │
│ 581 await self.execute_hook( │
│ 582 "after_goto", page, context=context, url=url, response=response, config=config │
│ 583 ) │
│ 584 │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

2025-03-09 05:16:40.268 | WARNING | general_process:main_process:170 - https://mp.weixin.qq.com/s?__biz=MzUxNjg4NDEzNA==&mid=2247522395&idx=1&sn=0a3aabc3bb6b5cdb8fe8fe313225fd4b failed to crawl
2025-03-09 05:16:40.468 | DEBUG | general_process:main_process:236 - task finished, focus_id: 6g27p3ibst0t279

@leoxu2024
Copy link

0.3.9crawl不了微信公众号文章,报错,怎么解决呢

@bigbrother666sh
Copy link
Member

试一下 0.3.9-patch2版本,重新拉下代码
不过这个版本微信公众号文章内容解析有问题,建议等到这个周末,我们会发布0.3.9-patch3

@leoxu2024
Copy link

INIT].... → Crawl4AI 0.5.0.post2 2025-03-09 05:16:22.423 | DEBUG | general_process:main_process:147 - process new url, still 0 urls in working list [ERROR]... × https://mp.weixin.qq.com/s?__biz=MzUxNjg4NDEzNA==&... | Error: ┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ × Unexpected error in _crawl_web at line 579 in _crawl_web (../../../.local/lib/python3.10/site- │ │ packages/crawl4ai/async_crawler_strategy.py): │ │ Error: Failed on navigating ACS-GOTO: │ │ Page.goto: net::ERR_NETWORK_CHANGED at │ │ https://mp.weixin.qq.com/s?__biz=MzUxNjg4NDEzNA==&mid=2247522395&idx=1&sn=0a3aabc3bb6b5cdb8fe8fe313225fd4b │ │ Call log: │ │ - navigating to │ │ "https://mp.weixin.qq.com/s?__biz=MzUxNjg4NDEzNA==&mid=2247522395&idx=1&sn=0a3aabc3bb6b5cdb8fe8fe313225fd4b", │ │ waiting until "commit" │ │ │ │ │ │ Code context: │ │ 574 response = await page.goto( │ │ 575 url, wait_until=config.wait_until, timeout=config.page_timeout │ │ 576 ) │ │ 577 redirected_url = page.url │ │ 578 except Error as e: │ │ 579 → raise RuntimeError(f"Failed on navigating ACS-GOTO:\n{str(e)}") │ │ 580 │ │ 581 await self.execute_hook( │ │ 582 "after_goto", page, context=context, url=url, response=response, config=config │ │ 583 ) │ │ 584 │ └───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

2025-03-09 05:16:40.268 | WARNING | general_process:main_process:170 - https://mp.weixin.qq.com/s?__biz=MzUxNjg4NDEzNA==&mid=2247522395&idx=1&sn=0a3aabc3bb6b5cdb8fe8fe313225fd4b failed to crawl 2025-03-09 05:16:40.468 | DEBUG | general_process:main_process:236 - task finished, focus_id: 6g27p3ibst0t279

@bigbrother666sh 你好,微信公众号采集的bug解决得怎么样了? patch3能解决了吗? 现在还是采集不了的哦

@leoxu2024
Copy link

试一下 0.3.9-patch2版本,重新拉下代码 不过这个版本微信公众号文章内容解析有问题,建议等到这个周末,我们会发布0.3.9-patch3

@bigbrother666sh 还在等着patch3解决采集不了微信公众号的问题

@bigbrother666sh
Copy link
Member

再多等一天了,明晚应该可以发布

@leoxu2024
Copy link

再多等一天了,明晚应该可以发布

@bigbrother666sh 更新到3.9-patch3了,但还是不能采集微信公众号文章,报错如下:
2025-03-19 12:05:26.042 | DEBUG | general_process:main_process:62 - focus_id: 6g27p3ibst0t279, focus_point: AI、人工智能、知
[INIT].... → Crawl4AI 0.5.0.post4
2025-03-19 12:05:27.265 | DEBUG | general_process:main_process:145 - process new url, still 7 urls in working list
Task exception was never retrieved
future: <Task finished name='Task-28' coro=<main_process() done, defined at /home/jzh/wiseflow/wiseflow-3.9/weixin_mp/../core/r a dict')>
Traceback (most recent call last):
File "/home/jzh/wiseflow/wiseflow-3.9/weixin_mp/../core/general_process.py", line 168, in main_process
result = custom_scrapersdomain
File "/home/jzh/wiseflow/wiseflow-3.9/weixin_mp/../core/scrapers/mp_scraper.py", line 33, in mp_scraper
raise TypeError('fetch_result must be a CrawlResult or a dict')
TypeError: fetch_result must be a CrawlResult or a dict

@bigbrother666sh
Copy link
Member

升级3.9patch3之后先执行 pip uninstall crawl4ai

@leoxu2024
Copy link

升级3.9patch3之后先执行 pip uninstall crawl4ai

@bigbrother666sh 升级到3.9patch3后,可采集到公众号文章了,感谢。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants