New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using Regex to split shell script #717
Conversation
I am using a new module Regex. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome! The results are quite accurate. Well done!! I have a few minor changes and considerations for you. Other than that, you can use this method in your implementation.
tern/utils/general.py
Outdated
@@ -6,6 +6,7 @@ | |||
import os | |||
import random | |||
import re | |||
import regex as mrab |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
regex
is not part of the standard library, so you will need to add it to requirements.in
and requirements.txt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nit: I don't think it's necessary to alias regex
. re
is a very commonly used python module and included in the standard library.
Add function split_shell_script() in tern/utils/general.py. This function receives a shell script, split it into statements: - command - variable - loop - branch In this commit, Regex is used to split the shell script by seperators which are '&&', '||', '|', ';', ':;'(this is a special case). To skip quotes while spliting, use (*SKIP)(*F). This is a feature in Regex not in re(python module). Add function parse_shell_variables_and_command(). Add function parse_shell_loop_and_branch(). Works towards tern-tools#682. Signed-off-by: WangJL <hazard15020@gmail.com>
I have implemented the rest of split function. Here are the local test output of the dockerfiles under |
I find another issue.
Currently we can not remove
Maybe we can use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, it looks good! I would only ask for you to complete the two TODOs you have recorded in this commit. If you think you need to add a new commit on top of this one, you can go ahead and do that.
tern/utils/general.py
Outdated
# TODO remove \t \r \n \ in the command | ||
statement['command'] = parse_command(concatenated_command) | ||
return statement | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can fix this issue in this PR as well. If you need to, add another commit on top of this one.
This commit enables parse_shell_loop_and_branch() to extract statements for a loop, and use clean_command() to clean tab and line indentations. Modificate the statement dictionary, change the dict for command: {'content':..., 'command':...} to {'conmmand':...}. Since we will do parse_command() in the futher part of finding the installed software, we will only record the concatenated command string here. Works towards tern-tools#682 and tern-tools#706. Signed-off-by: WangJL <hazard15020@gmail.com>
I added a new commit, and here are the changes:
Attached are the output from my local test. |
@ForgetMe17 I tried your changes on the golang RUN statement:
I get this output:
I think this looks pretty good! I wonder if you can take care of a few small things here: For command If we are monitoring variable assignments, we may want to also take into account |
Thanks for your adivce! I have made a quick fix by the latest commit. |
Pipe symbol '|' should not be seperated. Command 'export' can be treated as variable assignment. Works towards tern-tools#682. Signed-off-by: WangJL <hazard15020@gmail.com>
Here are the test results.
is now
|
Add function split_shell_script() in tern/utils/general.py.
This function receives a shell script, split it into statements:
In this commit, use Regex to split the shell script by seperators
which are
&&
,||
,|
,;
,:;
(this is a special case).To skip quotes while spliting, use (*SKIP)(*F). This is a feature
in Regex not in re(python module).
Works towards #682.
Signed-off-by: WangJL hazard15020@gmail.com