Skip to content

SSHCluster - localhost to localhost - The command line is too long #8138

@jcus0006

Description

@jcus0006

I am trying to run a multi-node, multi-host SSH cluster on Windows. I simplified it, for now, attempting to run both the scheduler and the workers on localhost. Based on the Dask documentation instructions, I setup public key SSH access, in this case, from localhost to localhost. Encountered this issue and fixed it by the recommended fix in the same link. Then encountered the next issue, which has to do with trying to run a command which is over the character limit imposed by Windows.

set_env = "set DASK_INTERNAL_INHERIT_CONFIG={} &&".format(
                    dask.config.serialize(dask.config.global_config)
                )

The above line from the "distributed\deploy\ssh.py", generates a string of 9000+ chars. Which seems to be a problem.

The next line of code creates the command "cmd", and the following line starts the process:
self.proc = await self.connection.create_process(cmd)

and the below line extracts this error - 'The command line is too long.\r\n':
line = await self.proc.stderr.readline()

In an attempt to reduce the size of the serialized config, I have tried removing the Kubernetes key from the dask.config.global_config, and re-adding it with an empty dict as value, thinking I should not need Kubernetes, since I am using the SSHCluster and not KubeCluster. When serializing the config, the length is less than the limit, and sure enough, I seem to get past the 'The command line is too long' error but get stuck with the below error instead:

2023-08-28 21:10:06,883 - distributed.deploy.ssh - INFO - raise JSONDecodeError("Expecting value", s, err.value) from None
2023-08-28 21:10:06,883 - distributed.deploy.ssh - INFO - json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

I am using Windows right now, and am considering installing a Linux VM to try this out. Was wondering if anyone has had this issue with Windows and what can be done to workaround it?

This is the code I am using in the main module:

import dask
from dask.distributed import Client, SSHCluster
cluster = SSHCluster(["localhost", "localhost"], 
                        connect_options={"known_hosts": None},
                        worker_options={"n_workers": 10},
                        scheduler_options={"port": 0, "dashboard_address": ":8797"})
client = Client(cluster)

Environment:

  • Dask version 2023.8.1
  • Python version 3.11.2
  • OS: Windows 10
  • Installed via Pip

Activity

onurarpacioglu

onurarpacioglu commented on Dec 9, 2024

@onurarpacioglu

Hello,

I encountered the same issue. I've found the fix for the JSON issue and also found a way to reduce the size of the command by some amount. With the below changes (3 lines with the comments), I no longer see the issue:

    cmd = " ".join(
        [
            #set_env, -> Removed this to shorten cmd; it is executed before cmd to preserve functionality
            self.remote_python,
            "-m",
            "distributed.cli.dask_spec",
            "--spec",
            '"%s"' % dumps({"cls": "distributed.Scheduler", "opts": self.kwargs}).replace('"', '\\"'), # exchanged places of ' and " at the beginning to fix the json issue
        ]
    )
    await self.connection.run(set_env) # added this due to removal above
    self.proc = await self.connection.create_process(cmd)

Thanks

jcus0006

jcus0006 commented on Dec 9, 2024

@jcus0006
Author

hi @onurarpacioglu , I ended up using Linux, but thanks for replying with your workaround. Hopefully, it could be useful to someone else.

holtvogt

holtvogt commented on Dec 10, 2024

@holtvogt

Receiving the same issue for Windows as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething is broken

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      Participants

      @jacobtomlinson@jcus0006@onurarpacioglu@holtvogt

      Issue actions

        SSHCluster - localhost to localhost - The command line is too long · Issue #8138 · dask/distributed