New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split commands from shell script in dockerfile #682
Comments
An example of RUN commands in a Dockerfile that is difficult to parse: |
This pr adds 2 functions in utils/general.py: 1. split_shell_script() This functions split a shell script(in string format) into 3 types of statements: variable, command and branch(if and case). A dictionary is used to store the infomation of the statement. It has following keys: - type: statement type(variable, command, if or case) - content: statement sentence(eg. 'apt-get update') - name: (only for variable) variable name - value: (only for variable) variable value (more keys will be added when we move on to parsing the branches) returns a list of statements 2. parse_shell_script() This function parse a shell script to get a command list. First it calls split_shell_script() to get a statement list, For command, it calls parse_command() to parse the command and extract command name, options and words. Works towards tern-tools#682. Signed-off-by: WangJL <hazard15020@gmail.com>
This pr adds 2 functions in utils/general.py: 1. split_shell_script() This functions split a shell script(in string format) into 3 types of statements: variable, command and branch(if and case). A dictionary is used to store the infomation of the statement. It has following keys: - type: statement type(variable, command, if or case) - content: statement sentence(eg. 'apt-get update') - name: (only for variable) variable name - value: (only for variable) variable value (more keys will be added when we move on to parsing the branches) returns a list of statements 2. parse_shell_script() This function parse a shell script to get a command list. First it calls split_shell_script() to get a statement list, For command, it calls parse_command() to parse the command and extract command name, options and words. Works towards tern-tools#682. Signed-off-by: WangJL <hazard15020@gmail.com>
this commit remove 1 function in previous commit and add 3 new functions. remove 1. parse_shell_script() no longer need this since in this commit, i move the parsing command function into split_shell_script(). add 1. parse_shell_variables_and_command() given a sentence, classify the variable and command type, and then parse it. add 2. parse_shell_if_branch() extract all commands in the if sentence, return a branch dictionary. add 3. parse_shell_case_branch() extract all commands in the case sentence, return a branch dictionary. Works towards tern-tools#682. Signed-off-by: WangJL <hazard15020@gmail.com>
this commit remove 1 function in previous commit and add 3 new functions. remove 1. parse_shell_script() no longer need this since in this commit, i move the parsing command function into split_shell_script(). add 1. parse_shell_variables_and_command() given a sentence, classify the variable and command type, and then parse it. add 2. parse_shell_if_branch() extract all commands in the if sentence, return a branch dictionary. add 3. parse_shell_case_branch() extract all commands in the case sentence, return a branch dictionary. Works towards tern-tools#682. Signed-off-by: WangJL <hazard15020@gmail.com>
Here is a shell script containing |
Add function split_shell_script() in tern/utils/general.py. This function receives a shell script, split it into statements: - command - variable - loop - branch In this commit, use Regex to split the shell script by seperators which are '&&', '||', '|', ';', ':;'(this is a special case). To skip quotes while spliting, use (*SKIP)(*F). This is a feature in Regex not in re(python module). Works towards tern-tools#682. Signed-off-by: WangJL <hazard15020@gmail.com>
Add function split_shell_script() in tern/utils/general.py. This function receives a shell script, split it into statements: - command - variable - loop - branch In this commit, use Regex to split the shell script by seperators which are '&&', '||', '|', ';', ':;'(this is a special case). To skip quotes while spliting, use (*SKIP)(*F). This is a feature in Regex not in re(python module). Works towards tern-tools#682. Signed-off-by: WangJL <hazard15020@gmail.com>
Dockerfiles are added at tests/dockerfiles/split_shell_script. Works towards tern-tools#682. Signed-off-by: WangJL <hazard15020@gmail.com>
Dockerfiles are added at tests/dockerfiles/split_shell_script. Works towards #682. Signed-off-by: WangJL <hazard15020@gmail.com>
Dockerfiles are added at tests/dockerfiles/split_shell_script. Works towards tern-tools#682. Signed-off-by: WangJL <hazard15020@gmail.com>
Add function split_shell_script() in tern/utils/general.py. This function receives a shell script, split it into statements: - command - variable - loop - branch In this commit, Regex is used to split the shell script by seperators which are '&&', '||', '|', ';', ':;'(this is a special case). To skip quotes while spliting, use (*SKIP)(*F). This is a feature in Regex not in re(python module). Add function parse_shell_variables_and_command(). Add function parse_shell_loop_and_branch(). Works towards tern-tools#682. Signed-off-by: WangJL <hazard15020@gmail.com>
This commit enables parse_shell_loop_and_branch() to extract statements for a loop. Works towards tern-tools#682. Signed-off-by: WangJL <hazard15020@gmail.com>
This commit enables parse_shell_loop_and_branch() to extract statements for a loop, and use clean_command() to clean tab and line indentations. Modificate the statement dictionary, change the dict for command: {'content':..., 'command':...} to {'conmmand':...}. Since we will do parse_command() in the futher part of finding the installed software, we will only record the concatenated command string here. Works towards tern-tools#682 and tern-tools#706. Signed-off-by: WangJL <hazard15020@gmail.com>
Pipe symbol '|' should not be seperated. Command 'export' can be treated as variable assignment. Works towards tern-tools#682 and tern-tools#706. Signed-off-by: WangJL <hazard15020@gmail.com>
Pipe symbol '|' should not be seperated. Command 'export' can be treated as variable assignment. Works towards tern-tools#682. Signed-off-by: WangJL <hazard15020@gmail.com>
Fixed in PR #717. Closing. |
Description
As the first step to parse Dockerfile RUN instruction commands, we need to split commands from a dockerfile RUN commands. A full shell script contains these elements,
variable assignment
: eg.dpkgArch="$(dpkg --print-architecture)";
command line
: eg.wget -O go.tgz "$url";
branch
: eg.case .... esac;
,if ... fi;
function
: eg.foo(){.....}
In this task, we need to parse a shell script into above parts and store them, possible in a List. For branches, we may need another task to parse its branches into
variable assignment
andcommand line
.To Do
Given a full shell script from RUN instruction of dockerfile, return a data structure that stores the parsing result. Most modifications will be made in
general.py
.parse_shell_script()
ingeneral.py
as a main function. It receives a string of RUN instruction, which is a shell script. It returns a command dict containing all the possible command in the given shell script.parse_command()
ingeneral.py
to parse the command dict to find possible installed software. This could be implemented by the following task issues.Something need to be discussed:
Maybe we can define a class for the parsed elements. It has a property shows what is this element, like
variable assignment
orcommand line
etc. We can implement further parse like parsing thebranch
as the class function.Or maybe we can use dict as the elements, it has a key called
statement
whose value can bevariable assignment
orcommand line
etc. And we add new functions ingeneral.py
.branch
.I am thinking adding a extra key pr property for the
command line
indicating if it is in a branch.At first we can take the default branch as the command for parse. In the further plan, we can add more analyze on this like considering varibles, or if this branch only infects the version we install but not the software name we install.
function
?Background
On looking at the type of parsing needed for full shell scripts embedded in the run command, we may need to develop a shell script parser to catch all places where software could have been installed.
Super Issues
#521
The text was updated successfully, but these errors were encountered: