Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wandb sync not logging in while running wandb local #1239

Closed
aclifton314 opened this issue Sep 16, 2020 · 30 comments
Closed

wandb sync not logging in while running wandb local #1239

aclifton314 opened this issue Sep 16, 2020 · 30 comments
Labels
ty:bug type of the issue is a bug

Comments

@aclifton314
Copy link

System Info

wandb: 0.9.7
python: 3.7.6
OS: Pop!_OS 20.04 LTS

Description

I installed wandb 0.9.7, ran wandb local, navigated to http://localhost:8080, the webpage gave me a "Application Error" page. I clicked the refresh button provided on the page, created an account, it then asked me to change my password. When I entered in my new password, the page reloads and asks me repeatedly to change the password. However, if I click the icon in the upper right I can get to the profile page.

I tried to run wandb sync MY_DRYRUN and was asked to run wandb login. I ran wandb login and a webpage briefly appeared with the local API key, but then quickly switched to the Change Password prompt. However, I was able to get the local API key from when I clicked on the icon in the upper right corner before.

I paste the local API key into the command line and login successfully. I try to run the wandb sync command again and get the following:

user@pop-os:~$ wandb sync path/to/wandb/dryrun-20200828_220136-10kab6tp/
wandb: ERROR Error while calling W&B API: permission denied (<Response [401]>)
Error: Invalid or missing api_key.  Run wandb login

I'm not sure if this workaround has messed something up, but I cannot sync using wandb local. I know this is similar to #1222. The main difference is that now I am behind a company proxy. I did some work to try and fix that for docker and am able to run docker run hello-world successfully.

Any thoughts about what might be going on?

@issue-label-bot issue-label-bot bot added the ty:bug type of the issue is a bug label Sep 16, 2020
@issue-label-bot
Copy link

Issue-Label Bot is automatically applying the label bug to this issue, with a confidence of 0.93. Please mark this comment with 👍 or 👎 to give our bot feedback!

Links: app homepage, dashboard and code for this bot.

@vanpelt
Copy link
Contributor

vanpelt commented Sep 16, 2020

@aclifton314 what email address did you use for your account? Can you try logging in to the local instance from an incognito window?

@aclifton314
Copy link
Author

I used the same email address as in #1222. I'm not sure if that would be causing a conflict. Either way it is a gmail address. I used firefox to open a private window (not sure if that is the same as incognito) and went to the local instance. I was taken to a page that says "Developer tools for deep learning" and clicked the login button. I get prompted to enter my email and password, and that sends me to my dashboard.

I then went to the command line and had the following error:

user@pop-os:~$ wandb sync wandb/dryrun-20200828_220136-10kab6tp/
wandb: ERROR Error while calling W&B API: permission denied (<Response [401]>)
Error: Invalid or missing api_key.  Run wandb login
user@pop-os:~$ wandb login
wandb: You can find your API key in your browser here: http://localhost:8080/authorize
wandb: Paste an API key from your profile and hit enter: MY_LOCAL_API_KEY
wandb: Appending key for localhost to your netrc file: /home/user/.netrc
Successfully logged in to Weights & Biases!
user@pop-os:~$ wandb sync wandb/dryrun-20200828_220136-10kab6tp/
wandb: ERROR Error while calling W&B API: permission denied (<Response [401]>)
Error: Invalid or missing api_key.  Run wandb login

@vanpelt
Copy link
Contributor

vanpelt commented Sep 17, 2020

Does your api key start with local-XXXXXX? You can see what is currently set for localhost running cat /home/user/.netrc and looking at the password under machine localhost. If it does contain local and it's the same as the one you see when you goto http://localhost:8080/authorize, then the library may be trying to connect to https://api.wandb.ai with the local api key. Can you paste the contents of wandb/debug.log if it still fails? Also, can you confirm the version of wandb with wandb --version, I assume it's 0.9.7.

@aclifton314
Copy link
Author

Yes, the api key starts with local and is the same one when I go to http://localhost:8080/authorize. Here are the contents of wandb/debug.log:

2020-09-17 09:41:03,696 DEBUG   MainThread:5073 [wandb_config.py:_load_defaults():150] wandb dir not provided, skipping defaults
2020-09-17 09:41:03,755 ERROR   MainThread:5073 [internal.py:execute():111] 401 response executing GraphQL.
2020-09-17 09:41:03,756 ERROR   MainThread:5073 [internal.py:execute():112] {"errors":[{"message":"permission denied","path":["upsertBucket"],"extensions":{"code":"PERMISSION_ERROR"}}],"data":{"upsertBucket":null}}
2020-09-17 09:41:03,766 ERROR   MainThread:5073 [cli.py:wrapper():161] Traceback (most recent call last):
  File "/path/to/anaconda3/lib/python3.7/site-packages/wandb/retry.py", line 95, in __call__
    result = self._call_fn(*args, **kwargs)
  File "/path/to/anaconda3/lib/python3.7/site-packages/wandb/apis/internal.py", line 114, in execute
    six.reraise(*sys.exc_info())
  File "/path/to/anaconda3/lib/python3.7/site-packages/six.py", line 703, in reraise
    raise value
  File "/path/to/anaconda3/lib/python3.7/site-packages/wandb/apis/internal.py", line 108, in execute
    return self.client.execute(*args, **kwargs)
  File "/path/to/anaconda3/lib/python3.7/site-packages/gql/client.py", line 52, in execute
    result = self._get_result(document, *args, **kwargs)
  File "/path/to/anaconda3/lib/python3.7/site-packages/gql/client.py", line 60, in _get_result
    return self.transport.execute(document, *args, **kwargs)
  File "/path/to/anaconda3/lib/python3.7/site-packages/gql/transport/requests.py", line 39, in execute
    request.raise_for_status()
  File "/path/to/anaconda3/lib/python3.7/site-packages/requests/models.py", line 940, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: http://localhost:8080/graphql

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/path/to/anaconda3/lib/python3.7/site-packages/wandb/apis/__init__.py", line 72, in wrapper
    return func(*args, **kwargs)
  File "/path/to/anaconda3/lib/python3.7/site-packages/wandb/apis/internal.py", line 785, in upsert_run
    mutation, variable_values=variable_values, **kwargs)
  File "/path/to/anaconda3/lib/python3.7/site-packages/wandb/retry.py", line 102, in __call__
    if not check_retry_fn(e):
  File "/path/to/anaconda3/lib/python3.7/site-packages/wandb/util.py", line 570, in no_retry_auth
    raise CommError("Invalid or missing api_key.  Run wandb login" + extra)
wandb.apis.CommError: Invalid or missing api_key.  Run wandb login

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/path/to/anaconda3/lib/python3.7/site-packages/wandb/cli.py", line 156, in wrapper
    return func(*args, **kwargs)
  File "/path/to/anaconda3/lib/python3.7/site-packages/wandb/cli.py", line 447, in sync
    path, run_id=id, project=project, entity=entity, ignore_globs=globs)
  File "/path/to/anaconda3/lib/python3.7/site-packages/wandb/wandb_run.py", line 320, in from_directory
    res = api.upsert_run(name=run_id, project=project, entity=entity, display_name=run_name)
  File "/path/to/anaconda3/lib/python3.7/site-packages/wandb/apis/__init__.py", line 109, in wrapper
    message, err), sys.exc_info()[2])
  File "/path/to/anaconda3/lib/python3.7/site-packages/six.py", line 702, in reraise
    raise value.with_traceback(tb)
  File "/path/to/anaconda3/lib/python3.7/site-packages/wandb/apis/__init__.py", line 72, in wrapper
    return func(*args, **kwargs)
  File "/path/to/anaconda3/lib/python3.7/site-packages/wandb/apis/internal.py", line 785, in upsert_run
    mutation, variable_values=variable_values, **kwargs)
  File "/path/to/anaconda3/lib/python3.7/site-packages/wandb/retry.py", line 102, in __call__
    if not check_retry_fn(e):
  File "/path/to/anaconda3/lib/python3.7/site-packages/wandb/util.py", line 570, in no_retry_auth
    raise CommError("Invalid or missing api_key.  Run wandb login" + extra)
wandb.apis.CommError: Invalid or missing api_key.  Run wandb login

I am indeed using wandb 0.9.7

@vanpelt
Copy link
Contributor

vanpelt commented Sep 17, 2020

@aclifton314 I'm at a loss for what's happening. You either have an incorrect api_key configured, or the entity / project you're trying to log to doesn't belong to the user you created. You can also try running your script with an api key set in your environment. Goto http://localhost:8080/settings and copy your api key. Then run your script with:

WANDB_API_KEY=YOUR_API_KEY_HERE python your_script.py

@aclifton314
Copy link
Author

@vanpelt The dryrun I created was actually created on another computing cluster that has wandb but not docker. I wasn't able to view the wandb results because it doesn't have docker installed. So I moved the dryrun folder to my local machine to view the results. Do you think this could have something to do with it? The fact that the dryrun was created on one machine (with one certain api key) and I'm trying to view it on a different machine (with a different api key)?

@vanpelt
Copy link
Contributor

vanpelt commented Sep 18, 2020

It could be looking at the wandb/settings file and trying to log to the wrong entity. Can you share the content of that file?

@aclifton314
Copy link
Author

aclifton314 commented Sep 18, 2020

It just says [default].

If that isn't helpful, is there a way to:

  1. Completely remove all traces of wandb from my local machine? I noticed that the ~/.netrc file still remains even after a pip uninstall.

  2. Provide me a list of files that should be in the dry run directories as well as any settings file? I don't need the contents. I just want to make sure that everything is there for wandb sync to work.

@vanpelt
Copy link
Contributor

vanpelt commented Sep 18, 2020

The only files wandb writes are ~/.netrc, ~/.config/wandb/settings, and ./wandb/settings. There's a chance a different entity is set in your home directories config, or in your terminal environment. One thing to try would be opening a python console on your local machine and confirm you can log to your local server with:

import wandb
wandb.init(project="local_test")
wandb.join()

@aclifton314
Copy link
Author

Ok. Here is what is in my ~/.config/wandb/settings:

[default]
anonymous = false
base_url = http://localhost:8080

Also, here is the result of the python console commands:

user@pop-os:~$ python
Python 3.7.6 (default, Jan  8 2020, 19:59:22) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import wandb
>>> wandb.init(project='local_test')
Retry attempt failed:
Traceback (most recent call last):
  File "/home/user/anaconda3/lib/python3.7/site-packages/urllib3/connection.py", line 157, in _new_conn
    (self._dns_host, self.port), self.timeout, **extra_kw
  File "/home/user/anaconda3/lib/python3.7/site-packages/urllib3/util/connection.py", line 84, in create_connection
    raise err
  File "/home/user/anaconda3/lib/python3.7/site-packages/urllib3/util/connection.py", line 74, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/anaconda3/lib/python3.7/site-packages/urllib3/connectionpool.py", line 672, in urlopen
    chunked=chunked,
  File "/home/user/anaconda3/lib/python3.7/site-packages/urllib3/connectionpool.py", line 387, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/home/user/anaconda3/lib/python3.7/http/client.py", line 1252, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/home/user/anaconda3/lib/python3.7/http/client.py", line 1298, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/home/user/anaconda3/lib/python3.7/http/client.py", line 1247, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/home/user/anaconda3/lib/python3.7/http/client.py", line 1026, in _send_output
    self.send(msg)
  File "/home/user/anaconda3/lib/python3.7/http/client.py", line 966, in send
    self.connect()
  File "/home/user/anaconda3/lib/python3.7/site-packages/urllib3/connection.py", line 184, in connect
    conn = self._new_conn()
  File "/home/user/anaconda3/lib/python3.7/site-packages/urllib3/connection.py", line 169, in _new_conn
    self, "Failed to establish a new connection: %s" % e
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f2b4e2b2b50>: Failed to establish a new connection: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/anaconda3/lib/python3.7/site-packages/requests/adapters.py", line 449, in send
    timeout=timeout
  File "/home/user/anaconda3/lib/python3.7/site-packages/urllib3/connectionpool.py", line 720, in urlopen
    method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
  File "/home/user/anaconda3/lib/python3.7/site-packages/urllib3/util/retry.py", line 436, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=8080): Max retries exceeded with url: /graphql (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f2b4e2b2b50>: Failed to establish a new connection: [Errno 111] Connection refused'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/user/anaconda3/lib/python3.7/site-packages/wandb/retry.py", line 95, in __call__
    result = self._call_fn(*args, **kwargs)
  File "/home/user/anaconda3/lib/python3.7/site-packages/wandb/apis/internal.py", line 108, in execute
    return self.client.execute(*args, **kwargs)
  File "/home/user/anaconda3/lib/python3.7/site-packages/gql/client.py", line 52, in execute
    result = self._get_result(document, *args, **kwargs)
  File "/home/user/anaconda3/lib/python3.7/site-packages/gql/client.py", line 60, in _get_result
    return self.transport.execute(document, *args, **kwargs)
  File "/home/user/anaconda3/lib/python3.7/site-packages/gql/transport/requests.py", line 38, in execute
    request = requests.post(self.url, **post_args)
  File "/home/user/anaconda3/lib/python3.7/site-packages/requests/api.py", line 116, in post
    return request('post', url, data=data, json=json, **kwargs)
  File "/home/user/anaconda3/lib/python3.7/site-packages/requests/api.py", line 60, in request
    return session.request(method=method, url=url, **kwargs)
  File "/home/user/anaconda3/lib/python3.7/site-packages/requests/sessions.py", line 533, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/user/anaconda3/lib/python3.7/site-packages/requests/sessions.py", line 646, in send
    r = adapter.send(request, **kwargs)
  File "/home/user/anaconda3/lib/python3.7/site-packages/requests/adapters.py", line 516, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=8080): Max retries exceeded with url: /graphql (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f2b4e2b2b50>: Failed to establish a new connection: [Errno 111] Connection refused'))
wandb: Network error (ConnectionError), entering retry loop. See /tmp/wandb-debug.log for full traceback.
wandb: Tracking run with wandb version 0.9.7
wandb: Network error (ConnectionError), entering retry loop. See /home/user/wandb/debug.log for full traceback.
wandb: Run data is saved locally in wandb/run-20200917_205542-3vven7ou
wandb: Network error (ConnectionError), entering retry loop. See /home/user/wandb/debug.log for full traceback.

W&B Error: Can't connect to network to query entity from API key
>>> wandb: Network error (ConnectionError), entering retry loop. See /home/user/wandb/debug.log for full traceback.
wandb: Ctrl-c pressed.

@vanpelt
Copy link
Contributor

vanpelt commented Sep 18, 2020

That error means we can't connect to http://localhost:8080 are you able to access http://localhost:8080 via a web browser running on that machine? You can run docker ps on that machine to see if our container is running. If you're running the docker container on a different machine than your python console, you would need to open your firewall and connect to the IP address of the machine it's running on.

@aclifton314
Copy link
Author

On the same machine that has the container running, I did the following:

user@pop-os:~/path/to/project$ wandb local
wandb: A new version of W&B local is available, upgrade by calling `wandb local --upgrade`
wandb: W&B local started at http://localhost:8080 🚀
wandb: You can stop the server by running `docker stop wandb-local`
user@pop-os:~/path/to/project$ docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS                    NAMES
d94e1b5b70df        wandb/local         "/sbin/my_init"     28 seconds ago      Up 28 seconds       0.0.0.0:8080->8080/tcp   wandb-local
user@pop-os:~/path/to/project$ 

and was able to access http://localhost:8080 via a web browser running on the same machine that is running the container. The web browser takes me to the home page that says "Get started with Weights and Biases".

@vanpelt
Copy link
Contributor

vanpelt commented Sep 18, 2020

From the same terminal you were running your python session in, can you run curl -v http://localhost:8080 and make sure it says < HTTP/1.1 200 OK in the output? Once that's confirmed, run python and try the three lines of python I sent.

@aclifton314
Copy link
Author

@vanpelt As always, thank you for your patience and cooperation working through this. Here is the output of that curl command:

* Uses proxy env variable no_proxy == 'localhost,127.0.0.0/8,::1,127.0.0.1,127.0.0.111,127.0.0.2'
*   Trying ::1:8080...
* TCP_NODELAY set
* connect to ::1 port 8080 failed: Connection refused
*   Trying 127.0.0.1:8080...
* TCP_NODELAY set
* connect to 127.0.0.1 port 8080 failed: Connection refused
* Failed to connect to localhost port 8080: Connection refused
* Closing connection 0
curl: (7) Failed to connect to localhost port 8080: Connection refused

@vanpelt
Copy link
Contributor

vanpelt commented Sep 21, 2020

It looks like you can't connect to localhost due to firewall or networking configuration issues. If you can connect via your browser it may be that the browser has a proper proxy configuration that allows it. You can use the HTTP_PROXY variable if you do indeed need to connect to localhost through a proxy, but that's unlikely.

@aclifton314
Copy link
Author

I have the HTTP_PROXY (and http_proxy) env variable set in my bashrc. Maybe it's not being read properly? I spoke with someone on my team familiar with docker and he suggested setting the --network host flag on the docker command. Does this sound like a reasonable solution? If so, how do I go about setting that flag?

@vanpelt
Copy link
Contributor

vanpelt commented Sep 21, 2020

You can run the docker command manually instead of using wandb local:

docker run --rm -d -v wandb:/vol -p 8080:8080 --network host --name wandb-local wandb/local

@aclifton314
Copy link
Author

Does --network host need to be --network localhost?

@vanpelt
Copy link
Contributor

vanpelt commented Sep 21, 2020

Nope, I believe the "host" network is the default so I doubt this will fix it. The mystery is how can you're browser connect to http://localhost:8080 but your shell can't.

@aclifton314
Copy link
Author

Made some progress, but I don't know how, hahaha.

  • I checked `localhost:8080/user_name' and there was no project uploaded.
  • I ran docker run --rm -d -v wandb:/vol -p 8080:8080 --network host --name wandb-local wandb/local. I only got the following warning followed by a long random alphanumeric string: WARNING: Published ports are discarded when using host network mode
  • I went to the directory that had my dry run and ran wandb sync dryrun*. Here is the output:
user@pop-os:~/path/to/wandb$ wandb sync dryrun-20200828_220136-10kab6tp/
wandb: Syncing dryrun-20200828_220136-10kab6tp/ to:
wandb: winter-wildflower-1 http://localhost:8080/user/huggingface/runs/36dn3hg0
wandb: Uploading history metrics
wandb: Updating run and uploading files
wandb:                                                                                
wandb: Finished!
  • I confirmed that the dry run was uploaded and synced on localhost:8080 via the web browser.

I'm not exactly sure how it ended up working, but it seems to be working fine now. Is there anything I can provide you that might help if this is a bug?

@vanpelt
Copy link
Contributor

vanpelt commented Sep 21, 2020

Thanks goodness! Looks like it was actually the --network host flag. Good to know!

@aclifton314
Copy link
Author

Just curious, does that --network host flag need to go into future versions of wandb local? If so I can file an official bug report or provide whatever information is needed.

@tyomhak
Copy link

tyomhak commented Sep 23, 2020

@vanpelt ?

@github-actions
Copy link

This issue is stale because it has been open 60 days with no activity.

@github-actions github-actions bot added the stale label Dec 19, 2020
@jawadSajid
Copy link

jawadSajid commented Dec 30, 2020

I keep getting this error when I run this:

 docker run --rm -d -v wandb:/vol -p 8080:8080 --network host --name wandb-local wandb/local

error

And when I run:

 docker run --rm -d -v wandb:/vol -p 8080:8080  --name wandb-local wandb/local

I am able to login in to wandb portal but model training is not synced. Kindly help me with this.

@vanpelt
Copy link
Contributor

vanpelt commented Dec 30, 2020

We'll need the debug bundle. You can access it at http://13.92.184.80/system-admin from the menu in the upper right corner. You can email this to vanpelt@wandb.com

@github-actions github-actions bot removed the stale label Dec 31, 2020
@github-actions
Copy link

github-actions bot commented Mar 2, 2021

This issue is stale because it has been open 60 days with no activity.

@github-actions github-actions bot added the stale label Mar 2, 2021
@afiaka87
Copy link

For the record - I ran into this exact same issue because I accidentally ran wandb from a different conda environment than the one it was originally set up in.

@ariG23498
Copy link
Contributor

Closing this ticket for the lack of activity.
Please feel free to comment to reopen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ty:bug type of the issue is a bug
Projects
None yet
Development

No branches or pull requests

6 participants