Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split commands from shell script in dockerfile #682

Closed
ForgetMe17 opened this issue May 6, 2020 · 3 comments
Closed

Split commands from shell script in dockerfile #682

ForgetMe17 opened this issue May 6, 2020 · 3 comments
Milestone

Comments

@ForgetMe17
Copy link
Contributor

Description
As the first step to parse Dockerfile RUN instruction commands, we need to split commands from a dockerfile RUN commands. A full shell script contains these elements,

  • variable assignment : eg. dpkgArch="$(dpkg --print-architecture)";
  • command line: eg. wget -O go.tgz "$url";
  • branch: eg. case .... esac;, if ... fi;
  • function: eg. foo(){.....}
    In this task, we need to parse a shell script into above parts and store them, possible in a List. For branches, we may need another task to parse its branches into variable assignment and command line.

To Do
Given a full shell script from RUN instruction of dockerfile, return a data structure that stores the parsing result. Most modifications will be made in general.py.

  • We can add a function called parse_shell_script() in general.py as a main function. It receives a string of RUN instruction, which is a shell script. It returns a command dict containing all the possible command in the given shell script.
  • Once the above command dict is given, we can modify the parse_command() in general.py to parse the command dict to find possible installed software. This could be implemented by the following task issues.

Something need to be discussed:

  • Data structure can be List.
  1. Maybe we can define a class for the parsed elements. It has a property shows what is this element, like variable assignment or command line etc. We can implement further parse like parsing the branch as the class function.

  2. Or maybe we can use dict as the elements, it has a key called statement whose value can be variable assignment or command line etc. And we add new functions in general.py.

  • How to handle branch.
    I am thinking adding a extra key pr property for the command line indicating if it is in a branch.
    At first we can take the default branch as the command for parse. In the further plan, we can add more analyze on this like considering varibles, or if this branch only infects the version we install but not the software name we install.
  • Should we consider function?

Background
On looking at the type of parsing needed for full shell scripts embedded in the run command, we may need to develop a shell script parser to catch all places where software could have been installed.

Super Issues
#521

@nishakm
Copy link
Contributor

nishakm commented May 12, 2020

An example of RUN commands in a Dockerfile that is difficult to parse:
https://github.com/docker-library/python/blob/06d7952f07eae237666f2fd0ccf6189a5d8ee83c/3.9-rc/buster/Dockerfile

ForgetMe17 added a commit to ForgetMe17/tern that referenced this issue May 14, 2020
This pr adds 2 functions in utils/general.py:
1. split_shell_script()
This functions split a shell script(in string format) into 3 types
of statements: variable, command and branch(if and case).
A dictionary is used to store the infomation of the statement.
It has following keys:
- type: statement type(variable, command, if or case)
- content: statement sentence(eg. 'apt-get update')
- name: (only for variable) variable name
- value: (only for variable) variable value
(more keys will be added when we move on to parsing the branches)
returns a list of statements

2. parse_shell_script()
This function parse a shell script to get a command list.
First it calls split_shell_script() to get a statement list,
For command, it calls parse_command() to parse the command
and extract command name, options and words.

Works towards tern-tools#682.

Signed-off-by: WangJL <hazard15020@gmail.com>
ForgetMe17 added a commit to ForgetMe17/tern that referenced this issue May 14, 2020
This pr adds 2 functions in utils/general.py:
1. split_shell_script()
This functions split a shell script(in string format) into 3 types
of statements: variable, command and branch(if and case).
A dictionary is used to store the infomation of the statement.
It has following keys:
- type: statement type(variable, command, if or case)
- content: statement sentence(eg. 'apt-get update')
- name: (only for variable) variable name
- value: (only for variable) variable value
(more keys will be added when we move on to parsing the branches)
returns a list of statements

2. parse_shell_script()
This function parse a shell script to get a command list.
First it calls split_shell_script() to get a statement list,
For command, it calls parse_command() to parse the command
and extract command name, options and words.

Works towards tern-tools#682.

Signed-off-by: WangJL <hazard15020@gmail.com>
ForgetMe17 added a commit to ForgetMe17/tern that referenced this issue May 19, 2020
this commit remove 1 function in previous commit
and add 3 new functions.

remove 1. parse_shell_script()
no longer need this since in this commit, i move the parsing
command function into split_shell_script().

add 1. parse_shell_variables_and_command()
given a sentence, classify the variable and command type,
and then parse it.

add 2. parse_shell_if_branch()
extract all commands in the if sentence, return a branch dictionary.

add 3. parse_shell_case_branch()
extract all commands in the case sentence, return a branch dictionary.

Works towards tern-tools#682.

Signed-off-by: WangJL <hazard15020@gmail.com>
ForgetMe17 added a commit to ForgetMe17/tern that referenced this issue May 19, 2020
this commit remove 1 function in previous commit
and add 3 new functions.

remove 1. parse_shell_script()
no longer need this since in this commit, i move the parsing
command function into split_shell_script().

add 1. parse_shell_variables_and_command()
given a sentence, classify the variable and command type,
and then parse it.

add 2. parse_shell_if_branch()
extract all commands in the if sentence, return a branch dictionary.

add 3. parse_shell_case_branch()
extract all commands in the case sentence, return a branch dictionary.

Works towards tern-tools#682.

Signed-off-by: WangJL <hazard15020@gmail.com>
@ForgetMe17
Copy link
Contributor Author

Here is a shell script containing loop, and the current output is
buster-slim.txt.
In line 63, the for loop is treated as a command, so we need to consider adding another element loop to the parsing procedure. I have file an issue #706 to fix this.

ForgetMe17 added a commit to ForgetMe17/tern that referenced this issue Jun 3, 2020
Add function split_shell_script() in tern/utils/general.py.
This function receives a shell script, split it into statements:
- command
- variable
- loop
- branch

In this commit, use Regex to split the shell script by seperators
which are '&&', '||', '|', ';', ':;'(this is a special case).
To skip quotes while spliting, use (*SKIP)(*F). This is a feature
in Regex not in re(python module).

Works towards tern-tools#682.

Signed-off-by: WangJL <hazard15020@gmail.com>
ForgetMe17 added a commit to ForgetMe17/tern that referenced this issue Jun 3, 2020
Add function split_shell_script() in tern/utils/general.py.
This function receives a shell script, split it into statements:
- command
- variable
- loop
- branch

In this commit, use Regex to split the shell script by seperators
which are '&&', '||', '|', ';', ':;'(this is a special case).
To skip quotes while spliting, use (*SKIP)(*F). This is a feature
in Regex not in re(python module).

Works towards tern-tools#682.

Signed-off-by: WangJL <hazard15020@gmail.com>
ForgetMe17 added a commit to ForgetMe17/tern that referenced this issue Jun 3, 2020
Dockerfiles are added at tests/dockerfiles/split_shell_script.

Works towards tern-tools#682.

Signed-off-by: WangJL <hazard15020@gmail.com>
nishakm pushed a commit that referenced this issue Jun 3, 2020
Dockerfiles are added at tests/dockerfiles/split_shell_script.

Works towards #682.

Signed-off-by: WangJL <hazard15020@gmail.com>
rnjudge pushed a commit to rnjudge/tern that referenced this issue Jun 5, 2020
Dockerfiles are added at tests/dockerfiles/split_shell_script.

Works towards tern-tools#682.

Signed-off-by: WangJL <hazard15020@gmail.com>
ForgetMe17 added a commit to ForgetMe17/tern that referenced this issue Jun 9, 2020
Add function split_shell_script() in tern/utils/general.py.
This function receives a shell script, split it into statements:
- command
- variable
- loop
- branch

In this commit, Regex is used to split the shell script by seperators
which are '&&', '||', '|', ';', ':;'(this is a special case).
To skip quotes while spliting, use (*SKIP)(*F). This is a feature
in Regex not in re(python module).

Add function parse_shell_variables_and_command().
Add function parse_shell_loop_and_branch().

Works towards tern-tools#682.

Signed-off-by: WangJL <hazard15020@gmail.com>
ForgetMe17 added a commit to ForgetMe17/tern that referenced this issue Jun 16, 2020
This commit enables parse_shell_loop_and_branch() to extract
statements for a loop.

Works towards tern-tools#682.

Signed-off-by: WangJL <hazard15020@gmail.com>
ForgetMe17 added a commit to ForgetMe17/tern that referenced this issue Jun 16, 2020
This commit enables parse_shell_loop_and_branch() to extract
statements for a loop, and use clean_command() to clean tab
and line indentations.

Modificate the statement dictionary, change the dict for command:
{'content':..., 'command':...} to {'conmmand':...}.
Since we will do parse_command() in the futher part of finding the
installed software, we will only record the concatenated command
string here.

Works towards tern-tools#682 and tern-tools#706.

Signed-off-by: WangJL <hazard15020@gmail.com>
ForgetMe17 added a commit to ForgetMe17/tern that referenced this issue Jun 18, 2020
Pipe symbol '|' should not be seperated.
Command 'export' can be treated as variable assignment.

Works towards tern-tools#682 and tern-tools#706.

Signed-off-by: WangJL <hazard15020@gmail.com>
ForgetMe17 added a commit to ForgetMe17/tern that referenced this issue Jun 18, 2020
Pipe symbol '|' should not be seperated.
Command 'export' can be treated as variable assignment.

Works towards tern-tools#682.

Signed-off-by: WangJL <hazard15020@gmail.com>
@ForgetMe17
Copy link
Contributor Author

Fixed in PR #717. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants