Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to run demo: Key error "data" #89

Open
gbezerra opened this issue Dec 30, 2021 · 7 comments
Open

Unable to run demo: Key error "data" #89

gbezerra opened this issue Dec 30, 2021 · 7 comments

Comments

@gbezerra
Copy link

gbezerra commented Dec 30, 2021

I'm trying to run the data lineage wikimedia demo but I'm running into an error:

Traceback (most recent call last):
File "/Users/georgebezerra/opt/anaconda3/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/Users/georgebezerra/opt/anaconda3/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/Users/georgebezerra/.vscode/extensions/ms-python.python-2021.12.1559732655/pythonFiles/lib/python/debugpy/main.py", line 45, in
cli.main()
File "/Users/georgebezerra/.vscode/extensions/ms-python.python-2021.12.1559732655/pythonFiles/lib/python/debugpy/../debugpy/server/cli.py", line 444, in main
run()
File "/Users/georgebezerra/.vscode/extensions/ms-python.python-2021.12.1559732655/pythonFiles/lib/python/debugpy/../debugpy/server/cli.py", line 285, in run_file
runpy.run_path(target_as_str, run_name=compat.force_str("main"))
File "/Users/georgebezerra/opt/anaconda3/lib/python3.8/runpy.py", line 265, in run_path
return _run_module_code(code, init_globals, run_name,
File "/Users/georgebezerra/opt/anaconda3/lib/python3.8/runpy.py", line 97, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "/Users/georgebezerra/opt/anaconda3/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/Users/georgebezerra/Dev/demo.py", line 19, in
source = catalog.add_source(name="wikimedia", source_type="postgresql", **wikimedia_db)
File "/Users/georgebezerra/opt/anaconda3/lib/python3.8/site-packages/data_lineage/init.py", line 319, in add_source
payload = self._post(path="sources", data=data, type="sources")
File "/Users/georgebezerra/opt/anaconda3/lib/python3.8/site-packages/data_lineage/init.py", line 144, in _post
return response.json()["data"]
KeyError: 'data'

The Docker piece seems to be running fine except of the tokern worker who is returning the following message:

/docker-entrypoint.sh: 11: exec: rq: not found

This is running on macbook pro with M1 chip.

@vrajat
Copy link
Member

vrajat commented Jan 2, 2022

There are a couple of different problems. The client call did not get a response it expected. Specifically there is data field in the JSON response. Can you paste logs from docker container docker container logs <id> ?

This is a separate problem. I'll check why the worker did not start in the container.

@marcellovictorino
Copy link

I have the same problem with the worker failing. Apparently, rq is not a recognized command (it shows as a dependency package in the toml file, but perhaps not available on Docker?)

I am struggling to follow the example.
It fails when importing Analyze from data_lineage - perhaps it was referencing an outdated version!?

It is also not entirely clear what parameters I should change to configure a connection to a specific database. There could be some improvements on the documentation regarding this.

Would it be possible to query Snowflake myself and store the results as static files, to be parsed by data_liineage? I would like to start small, but having a working example.

@vrajat
Copy link
Member

vrajat commented Jan 22, 2022

sigh rq is not working because I havent released a new version with it. I'll do that soon.

@marcellovictorino It is possible to programmatically use data lineage API. I find it easier but there is a learning ramp. However I dont have a good idea on how can I bring you up to speed other than reading the code.

@dennysrega1ado
Copy link

dennysrega1ado commented Mar 29, 2022

hi @vrajat, I'm curious, which URL should be pass to scan = Scan(docker_address)? tokern-viz or torken-api?

HTTPError: 404 Client Error: NOT FOUND for url: http://127.0.0.1:8000/api/v1/scan

data_lineage --version #0.8.5
❯ netstat -ntlp
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 127.0.0.1:35205         0.0.0.0:*               LISTEN      6220/python
tcp        0      0 127.0.0.1:9000          0.0.0.0:*               LISTEN      6220/python
tcp        0      0 127.0.0.1:9001          0.0.0.0:*               LISTEN      6220/python
tcp        0      0 127.0.0.1:9002          0.0.0.0:*               LISTEN      6220/python
tcp        0      0 127.0.0.1:9003          0.0.0.0:*               LISTEN      6220/python
tcp        0      0 127.0.0.1:9004          0.0.0.0:*               LISTEN      6220/python
tcp        0      0 127.0.0.1:44409         0.0.0.0:*               LISTEN      5861/node
tcp6       0      0 :::8000                 :::*                    LISTEN      -
❯ docker ps
CONTAINER ID   IMAGE                            COMMAND                  CREATED          STATUS          PORTS                    NAMES
3c3ae085fc87   tokern/data-lineage:latest       "/bin/sh -c '/docker…"   21 minutes ago   Up 21 minutes   0.0.0.0:4142->4142/tcp   tokern-data-lineage
6566a715479c   tokern/data-lineage:latest       "/bin/sh -c '/docker…"   2 hours ago      Up 56 minutes                            tokern_worker
e82b0a0d1f1d   tokern/data-lineage-viz:latest   "/docker-entrypoint.…"   9 hours ago      Up 2 hours      0.0.0.0:8000->80/tcp     tokern-data-lineage-visualizer

8f523dadd73d   postgres:13.2-alpine             "docker-entrypoint.s…"   9 hours ago      Up 2 hours                               tokern-catalog
4cf7df31d3f2   redis:6.2.6-alpine               "docker-entrypoint.s…"   9 hours ago      Up 2 hours                               tokern-redis

A jupyter notebook running in my host

docker_address = "http://127.0.0.1:8000"
scan = Scan(docker_address)
job = scan.start(source)

# Wait for scan to complete
status = ""
while (status != "finished" and status != "failed"):
    time.sleep(5)
    status = scan.get(job["id"])["status"]
    print("Status is {}".format(status))

HTTPError: 404 Client Error: NOT FOUND for url: http://127.0.0.1:8000/api/v1/scan

@Opperessor
Copy link

hi @vrajat, I'm curious, which URL should be pass to scan = Scan(docker_address)? tokern-viz or torken-api?

HTTPError: 404 Client Error: NOT FOUND for url: http://127.0.0.1:8000/api/v1/scan

data_lineage --version #0.8.5
❯ netstat -ntlp
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 127.0.0.1:35205         0.0.0.0:*               LISTEN      6220/python
tcp        0      0 127.0.0.1:9000          0.0.0.0:*               LISTEN      6220/python
tcp        0      0 127.0.0.1:9001          0.0.0.0:*               LISTEN      6220/python
tcp        0      0 127.0.0.1:9002          0.0.0.0:*               LISTEN      6220/python
tcp        0      0 127.0.0.1:9003          0.0.0.0:*               LISTEN      6220/python
tcp        0      0 127.0.0.1:9004          0.0.0.0:*               LISTEN      6220/python
tcp        0      0 127.0.0.1:44409         0.0.0.0:*               LISTEN      5861/node
tcp6       0      0 :::8000                 :::*                    LISTEN      -
❯ docker ps
CONTAINER ID   IMAGE                            COMMAND                  CREATED          STATUS          PORTS                    NAMES
3c3ae085fc87   tokern/data-lineage:latest       "/bin/sh -c '/docker…"   21 minutes ago   Up 21 minutes   0.0.0.0:4142->4142/tcp   tokern-data-lineage
6566a715479c   tokern/data-lineage:latest       "/bin/sh -c '/docker…"   2 hours ago      Up 56 minutes                            tokern_worker
e82b0a0d1f1d   tokern/data-lineage-viz:latest   "/docker-entrypoint.…"   9 hours ago      Up 2 hours      0.0.0.0:8000->80/tcp     tokern-data-lineage-visualizer

8f523dadd73d   postgres:13.2-alpine             "docker-entrypoint.s…"   9 hours ago      Up 2 hours                               tokern-catalog
4cf7df31d3f2   redis:6.2.6-alpine               "docker-entrypoint.s…"   9 hours ago      Up 2 hours                               tokern-redis

A jupyter notebook running in my host

docker_address = "http://127.0.0.1:8000"
scan = Scan(docker_address)
job = scan.start(source)

# Wait for scan to complete
status = ""
while (status != "finished" and status != "failed"):
    time.sleep(5)
    status = scan.get(job["id"])["status"]
    print("Status is {}".format(status))

HTTPError: 404 Client Error: NOT FOUND for url: http://127.0.0.1:8000/api/v1/scan

i got the same issue is it resolved?

@Opperessor
Copy link

/docker-entrypoint.sh: 11: exec: rq: not found
is it fixed?

@debedb
Copy link

debedb commented Oct 23, 2022

@Opperessor no, I just ran into the same problem.

I'm trying to run this programmatically as well but at the moment http://localhost:8080 gives me not found (when I run docker-compose, at least the web site shows up)

I'm going to continue to dig through because I am really interested in the promise of this project. Will try to post results or maybe a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants