Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception: Connection closed while reading from the driver #574

Closed
xjtupy opened this issue Aug 22, 2024 · 7 comments
Closed

Exception: Connection closed while reading from the driver #574

xjtupy opened this issue Aug 22, 2024 · 7 comments

Comments

@xjtupy
Copy link

xjtupy commented Aug 22, 2024

Hi, Try running the following code

graph_config = {
    "llm": {
        "model": "ollama/llama3",
        "temperature": 0,
        "format": "json",
        "base_url": "http://xx.xx.xx.xx:11434",
    },
    "embeddings": {
        "model": "ollama/nomic-embed-text",
        "temperature": 0,
    },
    "verbose": True
}

smart_scraper_graph = SmartScraperGraph(
    prompt="List me all the titles",
    source="https://blog.csdn.net/mopmgerg54mo/article/details/141028116",
    config=graph_config
)

result = smart_scraper_graph.run()
print(result)

Report an error:
Exception: Connection closed while reading from the driver

Could you help me solve it?
Thanks

@goasleep
Copy link
Contributor

goasleep commented Aug 22, 2024

Seem this error was raised by Playwright. You can try to

  • Give Playwright an upgrade?
  • Make sure your network's all clear?

it doesn't work, could you share the full exception stack with us?
@xjtupy

@xjtupy
Copy link
Author

xjtupy commented Aug 22, 2024

@goasleep I installed the latest version of Playwright==1.46.0 and the network is working fine.

The complete exception information is as follows:

--- Executing Fetch Node ---
--- (Fetching HTML from: https://blog.csdn.net/mopmgerg54mo/article/details/141028116) ---
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
[/tmp/ipykernel_188540/3999856684.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/tmp/ipykernel_188540/3999856684.py) in <module>
     19 )
     20 
---> 21 result = smart_scraper_graph.run()
     22 print(result)

[~/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/graphs/smart_scraper_graph.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/graphs/smart_scraper_graph.py) in run(self)
    112 
    113         inputs = {"user_prompt": self.prompt, self.input_key: self.source}
--> 114         self.final_state, self.execution_info = self.graph.execute(inputs)
    115 
    116         return self.final_state.get("answer", "No answer found.")

[~/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/graphs/base_graph.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/graphs/base_graph.py) in execute(self, initial_state)
    261             return (result["_state"], [])
    262         else:
--> 263             return self._execute_standard(initial_state)
    264 
    265     def append_node(self, node):

[~/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/graphs/base_graph.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/graphs/base_graph.py) in _execute_standard(self, initial_state)
    183                         exception=str(e)
    184                     )
--> 185                     raise e
    186                 node_exec_time = time.time() - curr_time
    187                 total_exec_time += node_exec_time

[~/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/graphs/base_graph.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/graphs/base_graph.py) in _execute_standard(self, initial_state)
    167             with get_openai_callback() as cb:
    168                 try:
--> 169                     result = current_node.execute(state)
    170                 except Exception as e:
    171                     error_node = current_node.node_name

[~/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/nodes/fetch_node.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/nodes/fetch_node.py) in execute(self, state)
    125             return self.handle_local_source(state, source)
    126         else:
--> 127             return self.handle_web_source(state, source)
    128 
    129     def handle_directory(self, state, input_type, source):

[~/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/nodes/fetch_node.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/nodes/fetch_node.py) in handle_web_source(self, state, source)
    277             else:
    278                 loader = ChromiumLoader([source], headless=self.headless, **loader_kwargs)
--> 279                 document = loader.load()
    280 
    281             if not document or not document[0].page_content.strip():

[~/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/langchain_core/document_loaders/base.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/langchain_core/document_loaders/base.py) in load(self)
     28     def load(self) -> List[Document]:
     29         """Load data into Document objects."""
---> 30         return list(self.lazy_load())
     31 
     32     async def aload(self) -> List[Document]:

[~/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/docloaders/chromium.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/docloaders/chromium.py) in lazy_load(self)
    109 
    110         for url in self.urls:
--> 111             html_content = asyncio.run(scraping_fn(url))
    112             metadata = {"source": url}
    113             yield Document(page_content=html_content, metadata=metadata)

[~/.local/lib/python3.9/site-packages/nest_asyncio.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/.local/lib/python3.9/site-packages/nest_asyncio.py) in run(future, debug)
     30         loop = asyncio.get_event_loop()
     31         loop.set_debug(debug)
---> 32         return loop.run_until_complete(future)
     33 
     34     if sys.version_info >= (3, 6, 0):

[~/.local/lib/python3.9/site-packages/nest_asyncio.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/.local/lib/python3.9/site-packages/nest_asyncio.py) in run_until_complete(self, future)
     68                 raise RuntimeError(
     69                     'Event loop stopped before Future completed.')
---> 70             return f.result()
     71 
     72     def _run_once(self):

[~/miniconda3/envs/py39ddmpeng/lib/python3.9/asyncio/futures.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/asyncio/futures.py) in result(self)
    199         self.__log_traceback = False
    200         if self._exception is not None:
--> 201             raise self._exception
    202         return self._result
    203 

[~/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/playwright/_impl/_connection.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/playwright/_impl/_connection.py) in wrap_api_call(self, cb, is_internal)
    510         self._api_zone.set(parsed_st)
    511         try:
--> 512             return await cb()
    513         except Exception as error:
    514             raise rewrite_error(error, f"{parsed_st['apiName']}: {error}") from None

[~/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/playwright/_impl/_connection.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/playwright/_impl/_connection.py) in inner_send(self, method, params, return_as_dict)
     95         if not callback.future.done():
     96             callback.future.cancel()
---> 97         result = next(iter(done)).result()
     98         # Protocol now has named return values, assume result is one level deeper unless
     99         # there is explicit ambiguity.

[~/miniconda3/envs/py39ddmpeng/lib/python3.9/asyncio/futures.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/asyncio/futures.py) in result(self)
    199         self.__log_traceback = False
    200         if self._exception is not None:
--> 201             raise self._exception
    202         return self._result
    203 

[~/miniconda3/envs/py39ddmpeng/lib/python3.9/asyncio/tasks.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/asyncio/tasks.py) in __step(***failed resolving arguments***)
    254                 # We use the `send` method directly, because coroutines
    255                 # don't have `__iter__` and `__next__` methods.
--> 256                 result = coro.send(None)
    257             else:
    258                 result = coro.throw(exc)

[~/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/docloaders/chromium.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/docloaders/chromium.py) in ascrape_playwright(self, url)
     78         logger.info("Starting scraping...")
     79         results = ""
---> 80         async with async_playwright() as p:
     81             browser = await p.chromium.launch(
     82                 headless=self.headless, proxy=self.proxy, **self.browser_config

[~/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/playwright/async_api/_context_manager.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/playwright/async_api/_context_manager.py) in __aenter__(self)
     44         if not playwright_future.done():
     45             playwright_future.cancel()
---> 46         playwright = AsyncPlaywright(next(iter(done)).result())
     47         playwright.stop = self.__aexit__  # type: ignore
     48         return playwright

[~/miniconda3/envs/py39ddmpeng/lib/python3.9/asyncio/futures.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/asyncio/futures.py) in result(self)
    199         self.__log_traceback = False
    200         if self._exception is not None:
--> 201             raise self._exception
    202         return self._result
    203 

Exception: Connection closed while reading from the driver

@goasleep
Copy link
Contributor

I got it. You use Jupyter notebook to run this code. Jupyter have their own async event loop and asyncio.run will open new event loop so it will raise this error. you could switch to a plain Python file to run your script. If you're keen on sticking with Jupyter, just make sure to run certain lines of code before executing your main script.

!pip install nest-asyncio
import nest_asyncio
nest_asyncio.apply()

graph_config = ....

@xjtupy

@xjtupy
Copy link
Author

xjtupy commented Aug 23, 2024

@goasleep I added the following code in Jupyter and the error still occurs

import nest_asyncio
nest_asyncio.apply()

In addition, I wrote a python file to run that code on Linux, and it also reported this error

Traceback (most recent call last):
  File "/home/odin/ddmpeng/tmp.py", line 23, in <module>
    result = smart_scraper_graph.run()
  File "/home/odin/ddmpeng/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/graphs/smart_scraper_graph.py", line 114, in run
    self.final_state, self.execution_info = self.graph.execute(inputs)
  File "/home/odin/ddmpeng/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/graphs/base_graph.py", line 263, in execute
    return self._execute_standard(initial_state)
  File "/home/odin/ddmpeng/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/graphs/base_graph.py", line 185, in _execute_standard
    raise e
  File "/home/odin/ddmpeng/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/graphs/base_graph.py", line 169, in _execute_standard
    result = current_node.execute(state)
  File "/home/odin/ddmpeng/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/nodes/fetch_node.py", line 127, in execute
    return self.handle_web_source(state, source)
  File "/home/odin/ddmpeng/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/nodes/fetch_node.py", line 279, in handle_web_source
    document = loader.load()
  File "/home/odin/ddmpeng/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/langchain_core/document_loaders/base.py", line 30, in load
    return list(self.lazy_load())
  File "/home/odin/ddmpeng/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/docloaders/chromium.py", line 111, in lazy_load
    html_content = asyncio.run(scraping_fn(url))
  File "/home/odin/ddmpeng/miniconda3/envs/py39ddmpeng/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/home/odin/ddmpeng/miniconda3/envs/py39ddmpeng/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
    return future.result()
  File "/home/odin/ddmpeng/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/playwright/_impl/_connection.py", line 512, in wrap_api_call
    return await cb()
  File "/home/odin/ddmpeng/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/playwright/_impl/_connection.py", line 97, in inner_send
    result = next(iter(done)).result()
  File "/home/odin/ddmpeng/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/scrapegraphai/docloaders/chromium.py", line 80, in ascrape_playwright
    async with async_playwright() as p:
  File "/home/odin/ddmpeng/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/playwright/async_api/_context_manager.py", line 46, in __aenter__
    playwright = AsyncPlaywright(next(iter(done)).result())
Exception: Connection closed while reading from the driver
Task exception was never retrieved
future: <Task finished name='Task-4' coro=<Connection.run.<locals>.init() done, defined at /home/odin/ddmpeng/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/playwright/_impl/_connection.py:269> exception=Exception('Connection.init: Connection closed while reading from the driver')>
Traceback (most recent call last):
  File "/home/odin/ddmpeng/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/playwright/_impl/_connection.py", line 270, in init
    self.playwright_future.set_result(await self._root_object.initialize())
  File "/home/odin/ddmpeng/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/playwright/_impl/_connection.py", line 212, in initialize
    await self._channel.send(
  File "/home/odin/ddmpeng/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/playwright/_impl/_connection.py", line 59, in send
    return await self._connection.wrap_api_call(
  File "/home/odin/ddmpeng/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/playwright/_impl/_connection.py", line 514, in wrap_api_call
    raise rewrite_error(error, f"{parsed_st['apiName']}: {error}") from None
Exception: Connection.init: Connection closed while reading from the driver

@goasleep
Copy link
Contributor

I try it on linux but I cannot reproduce this problem. Could you help to run below code in Jupyter?If still get same error. maybe reach out to the Playwright folks for some assistance. @xjtupy

import asyncio
import nest_asyncio
nest_asyncio.apply()

from playwright.async_api import async_playwright

url = "https://blog.csdn.net/mopmgerg54mo/article/details/141028116"
async def main():
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        page = await browser.new_page()
        await page.goto(url, wait_until="domcontentloaded")
        await browser.close()
        print(page)

asyncio.run(main())

@xjtupy
Copy link
Author

xjtupy commented Aug 23, 2024

@goasleep Unfortunately, this problem still occurs

---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
[/tmp/ipykernel_188540/1335809689.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/tmp/ipykernel_188540/1335809689.py) in <module>
     14         print(page)
     15 
---> 16 asyncio.run(main())

[~/.local/lib/python3.9/site-packages/nest_asyncio.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/.local/lib/python3.9/site-packages/nest_asyncio.py) in run(future, debug)
     30         loop = asyncio.get_event_loop()
     31         loop.set_debug(debug)
---> 32         return loop.run_until_complete(future)
     33 
     34     if sys.version_info >= (3, 6, 0):

[~/.local/lib/python3.9/site-packages/nest_asyncio.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/.local/lib/python3.9/site-packages/nest_asyncio.py) in run_until_complete(self, future)
     68                 raise RuntimeError(
     69                     'Event loop stopped before Future completed.')
---> 70             return f.result()
     71 
     72     def _run_once(self):

[~/miniconda3/envs/py39ddmpeng/lib/python3.9/asyncio/futures.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/asyncio/futures.py) in result(self)
    199         self.__log_traceback = False
    200         if self._exception is not None:
--> 201             raise self._exception
    202         return self._result
    203 

[~/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/playwright/_impl/_connection.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/playwright/_impl/_connection.py) in wrap_api_call(self, cb, is_internal)
    510         self._api_zone.set(parsed_st)
    511         try:
--> 512             return await cb()
    513         except Exception as error:
    514             raise rewrite_error(error, f"{parsed_st['apiName']}: {error}") from None

[~/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/playwright/_impl/_connection.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/playwright/_impl/_connection.py) in inner_send(self, method, params, return_as_dict)
     95         if not callback.future.done():
     96             callback.future.cancel()
---> 97         result = next(iter(done)).result()
     98         # Protocol now has named return values, assume result is one level deeper unless
     99         # there is explicit ambiguity.

[~/miniconda3/envs/py39ddmpeng/lib/python3.9/asyncio/futures.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/asyncio/futures.py) in result(self)
    199         self.__log_traceback = False
    200         if self._exception is not None:
--> 201             raise self._exception
    202         return self._result
    203 

[~/miniconda3/envs/py39ddmpeng/lib/python3.9/asyncio/tasks.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/asyncio/tasks.py) in __step(***failed resolving arguments***)
    254                 # We use the `send` method directly, because coroutines
    255                 # don't have `__iter__` and `__next__` methods.
--> 256                 result = coro.send(None)
    257             else:
    258                 result = coro.throw(exc)

[/tmp/ipykernel_188540/1335809689.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/tmp/ipykernel_188540/1335809689.py) in main()
      7 url = "https://blog.csdn.net/mopmgerg54mo/article/details/141028116"
      8 async def main():
----> 9     async with async_playwright() as p:
     10         browser = await p.chromium.launch(headless=True)
     11         page = await browser.new_page()

~/miniconda3/envs/py39ddmpeng/lib/python3.9/site-packages/playwright/async_api/_context_manager.py in __aenter__(self)
     44         if not playwright_future.done():
     45             playwright_future.cancel()
---> 46         playwright = AsyncPlaywright(next(iter(done)).result())
     47         playwright.stop = self.__aexit__  # type: ignore
     48         return playwright

[~/miniconda3/envs/py39ddmpeng/lib/python3.9/asyncio/futures.py](http://map-search-jupyter-allhjasa-32-59-8897.intra.didiglobal.com/home/odin/miniconda3/envs/py39ddmpeng/lib/python3.9/asyncio/futures.py) in result(self)
    199         self.__log_traceback = False
    200         if self._exception is not None:
--> 201             raise self._exception
    202         return self._result
    203 

Exception: Connection closed while reading from the driver

@goasleep
Copy link
Contributor

goasleep commented Aug 23, 2024

@goasleep Unfortunately, this problem still occurs

if get same error in running above code? if yes, you can ask playwright for help and create new issue in playwright issue and linking new playwright issue in this issue.

I guess it is your env problems cause it. I suggest you use docker to isolate the environment and then try again. @xjtupy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants