Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vdk-jobs-troubleshooting: Add thread-dump utility #1456

Merged
merged 7 commits into from
Jan 5, 2023

Conversation

doks5
Copy link
Contributor

@doks5 doks5 commented Dec 16, 2022

This change adds the implementation of a thread-dump utility to the vdk-jobs-troubleshooting plugin.

The utility uses an http server, through which an administrator is able to force a stacktrace dump of all threads used by the python process of the data job. The server is bound to a port on the localhost, so to get the stacktrace, one needs to be attached to the data job pod.

Testing Done: Added unit tests for the utility registry, and tested the plugin itself locally by running a simple data job and examining the execution logs.

Signed-off-by: Andon Andonov andonova@vmware.com

@doks5 doks5 force-pushed the person/andonova/vdk-job-troubleshooting-plugin branch from 3d7afdc to 165377f Compare December 16, 2022 14:56
This change adds the implementation of a thread-dump utility to the
vdk-jobs-troubleshooting plugin.

The utility uses an http server, through which an administrator is able
to force a stacktrace dump of all threads used by the python process of
the data job. The server is bound to a port on the localhost, so to get
the stacktrace, one needs to be attached to the data job pod.

Testing Done: Added unit tests for the utility registry, and tested the
plugin itself locally by running a simple data job and examining the
execution logs.

Signed-off-by: Andon Andonov <andonova@vmware.com>
@doks5 doks5 force-pushed the person/andonova/vdk-job-troubleshooting-plugin branch from 165377f to 21bc48f Compare December 16, 2022 15:52
Copy link
Collaborator

@antoniivanov antoniivanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please write functional test . The cookiecutter provides template - https://github.com/tozka/cookiecutter-vdk-plugin/blob/main/vdk-%7B%7Bcookiecutter.plugin_name%7D%7D/tests/test_plugin.py

The functional tests are very powerful mechanism to make sure the functionality we write really works as expected - as they are really "data jobs" that use that functionality.

Add functional test.
Minor code refactoring.

Signed-off-by: Dako Dakov <ddakov@vmware.com>
@dakodakov
Copy link
Collaborator

Please write functional test . The cookiecutter provides template - https://github.com/tozka/cookiecutter-vdk-plugin/blob/main/vdk-%7B%7Bcookiecutter.plugin_name%7D%7D/tests/test_plugin.py

The functional tests are very powerful mechanism to make sure the functionality we write really works as expected - as they are really "data jobs" that use that functionality.

I just added a functional test.

Add documentation.
Remove redundant Optional type.

Signed-off-by: Dako Dakov <ddakov@vmware.com>
Remove unused import.

Signed-off-by: Dako Dakov <ddakov@vmware.com>
Remove unused import.

Signed-off-by: Dako Dakov <ddakov@vmware.com>
@murphp15
Copy link
Collaborator

murphp15 commented Jan 4, 2023

I think that there should be docs describing how to use it.
Especially to emphasise that the onus is on the developer to open a proxy to the pod if they want to use this functionality.

Copy link
Collaborator

@antoniivanov antoniivanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@dakodakov dakodakov merged commit df2798f into main Jan 5, 2023
@dakodakov dakodakov deleted the person/andonova/vdk-job-troubleshooting-plugin branch January 5, 2023 16:30
@antoniivanov
Copy link
Collaborator

I can see it's not release yet (the release job did not trigger ) . It's commented out but I think you can release it:
https://github.com/vmware/versatile-data-kit/blob/main/projects/vdk-plugins/vdk-jobs-troubleshooting/.plugin-ci.yml#L33

duyguHsnHsn pushed a commit that referenced this pull request Jan 6, 2023
* vdk-jobs-troubleshooting: Add thread-dump utility

This change adds the implementation of a thread-dump utility to the
vdk-jobs-troubleshooting plugin.

The utility uses an http server, through which an administrator is able
to force a stacktrace dump of all threads used by the python process of
the data job. The server is bound to a port on the localhost, so to get
the stacktrace, one needs to be attached to the data job pod.

Testing Done: Added unit tests for the utility registry, and tested the
plugin itself locally by running a simple data job and examining the
execution logs.

Signed-off-by: Andon Andonov <andonova@vmware.com>

* vdk-jobs-troubleshooting: Add functional test

Add functional test.
Minor code refactoring.

Signed-off-by: Dako Dakov <ddakov@vmware.com>

* vdk-jobs-troubleshooting: Address review feedback

Add documentation.
Remove redundant Optional type.

Signed-off-by: Dako Dakov <ddakov@vmware.com>

* vdk-jobs-troubleshooting: remove unused import

Remove unused import.

Signed-off-by: Dako Dakov <ddakov@vmware.com>

* vdk-jobs-troubleshooting: remove unused import

Remove unused import.

Signed-off-by: Dako Dakov <ddakov@vmware.com>

* vdk-jobs-troubleshooting: Address review feedback

Remove redundancies.

Signed-off-by: Dako Dakov <ddakov@vmware.com>

Signed-off-by: Andon Andonov <andonova@vmware.com>
Signed-off-by: Dako Dakov <ddakov@vmware.com>
Co-authored-by: ddakov <ddakov@vmware.com>
duyguHsnHsn pushed a commit that referenced this pull request Jan 6, 2023
* vdk-jobs-troubleshooting: Add thread-dump utility

This change adds the implementation of a thread-dump utility to the
vdk-jobs-troubleshooting plugin.

The utility uses an http server, through which an administrator is able
to force a stacktrace dump of all threads used by the python process of
the data job. The server is bound to a port on the localhost, so to get
the stacktrace, one needs to be attached to the data job pod.

Testing Done: Added unit tests for the utility registry, and tested the
plugin itself locally by running a simple data job and examining the
execution logs.

Signed-off-by: Andon Andonov <andonova@vmware.com>

* vdk-jobs-troubleshooting: Add functional test

Add functional test.
Minor code refactoring.

Signed-off-by: Dako Dakov <ddakov@vmware.com>

* vdk-jobs-troubleshooting: Address review feedback

Add documentation.
Remove redundant Optional type.

Signed-off-by: Dako Dakov <ddakov@vmware.com>

* vdk-jobs-troubleshooting: remove unused import

Remove unused import.

Signed-off-by: Dako Dakov <ddakov@vmware.com>

* vdk-jobs-troubleshooting: remove unused import

Remove unused import.

Signed-off-by: Dako Dakov <ddakov@vmware.com>

* vdk-jobs-troubleshooting: Address review feedback

Remove redundancies.

Signed-off-by: Dako Dakov <ddakov@vmware.com>

Signed-off-by: Andon Andonov <andonova@vmware.com>
Signed-off-by: Dako Dakov <ddakov@vmware.com>
Co-authored-by: ddakov <ddakov@vmware.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants