Autor: Luiz Carlos
Data: 31/12/2021
Bash is a Unix shell and command language written by Brian Fox for the GNU Project as a free software replacement for the Bourne shell. First released in 1989, it has been used as the default login shell for most Linux distributions. A version is also available for Windows, via the Windows Subsystem for Linux.¹
# Showing output and overwritting a existing file using pipe (|) and tee
command | tee file.txt
# Showing output and append it to a existing file
command | tee -a file.txt
# Showing output and append it to a existing file
command | tee -a file.txt
Those are a *nix command line used to display the contents of a file in a console.
The basic usage of more command is to run the command against a file as shown below:
more filename.ext
less filename.ext
Option for less and more:
-g # highlight serch, last match
-G # highlight serch
-s # squeze long lines
-S # chop-long-lines
MOVING:
e or ▼ # forward one line
y or ▲ # backward one line
f or space # forward one window
b # backward one windown
SEARCHING:
/pattern # search for a parttern forward
?pattern # search for a pattern backward
&pattern # display only matching lines
JUMPING
g - ESC-< # Go to first line in file (or line N).
10g # Jump to a specific line forward.
G - ESC-> # Go to last line in file (or line N).
10G # Jump to a specific line backward.
In nano
ctrl + shift + - and choose the line number
Exit
q # quit reading file
* # Any kind of character one or more times
? # Any kind of character one time
[0-9] # Any number character
[a-z] # Any alphabetic character
{pattern1,pattern2,etc.}.png
Usage:
ls *
ls *.png
ls [0-9]*.png # find a match which have a number repeated mulpliple times. Such as 111.png, 90876.png etc.
ls [a-z]*.png # The same as above, but now with alphabetical characteres.
ls ?mage.png # Find a match which starts with any character only one time folowed by mage.png
ls {file1,file2}.png
Description:
grep searches for PATTERNS in each FILE. PATTERNS is one or more patterns separated by newline characters, and grep prints each line that matches a pattern. Typically PATTERNS should be quoted when grep is used in a shell command.
A FILE of “-” stands for standard input. If no FILE is given, recursive searches examine the working directory, and nonrecursive searches read standard input.
In addition, the variant programs egrep, fgrep and rgrep are the same as grep -E, grep -F, and grep -r, respectively. These variants are deprecated, but are provided for backward compatibility.
Sintax:
grep [OPTION] PATTERNS [FILE]
grep [OPTION] -e PATTERNS [FILE]
grep [OPTION] -f PATTERN_FILE [FILE]
Options:
-e PATTERNS, --regexp=PATTERNS
Use PATTERNS as the patterns. If this option is used multiple times or is combined with the -f (--file) option,
search for all patterns given. This option can be used to protect a pattern beginning with “-”.
-f FILE, --file=FILE
Obtain patterns from FILE, one per line. If this option is used multiple times or is combined with the -e (--regexp) option,
search for all patterns given. The empty file contains zero patterns, and therefore matches nothing.
-i, --ignore-case
Ignore case distinctions in patterns and input data, so that characters that differ only in case match each other.
-v, --invert-match
Invert the sense of matching, to select non-matching lines.
-c, --count
Suppress normal output; instead print a count of matching lines for each input file. With the -v, count non-matching lines.
-o, --only-matching
Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line.
-n, --line-number
Prefix each line of output with the 1-based line number within its input file.
-P, --perl-regexp
Interpret PATTERNS as Perl-compatible regular expressions (PCREs). This option is experimental when combined with
the -z (--null-data) option, and grep -P may warn of unimplemented features.
Usage:
grep pattern file.ext
# find a pattern in all files.txt
grep "pattern" *.txt
# return the line number of the pattern
grep -n "pattern" file.ext
# Counts the occurences of the pattern
grep -c "pattern" file.ext
# to ignore case sensitive
grep -i "PatTern" file.ext
# to grep all except the pattern
grep -v "pattern" filename
# to grep multiple patterns
grep "[I,D]"
# to grep multiple patterns with extended expression
egrep "pattern1|parttern2" file
grep -e "pattern1" -e "pattern2" file
# Seqching from pattern with range of number
grep --regexp=ID={0..100}_{1..3} file
# to show x lines Before pattern
grep -B 10 "pattern" *.txt
# to show x lines after pattern
grep -A 10 "pattern" *.txt
Usage with --perl-regexp patterns
\d = digit character
\D = non digit character
\t = tab delimited
\n = end line, equal to $
\w = word
\W = non word
\s = whitespace
\S = non whitespace
. = any character
+ = 1 or more character
* = zero or more characteres
grep -P "^ID=\d+"
grep -P "ID=\d+\tName:\w" file.ext
SED command in UNIX stands for stream editor and it can perform lots of functions on file like searching, find and replace, insertion or deletion.
Usage:
sed OPTIONS... [SCRIPT] [INPUTFILE...]
options:
-i ------> change the file in place
-e ------> print without changing the file
-n ------> show just the result of the command
s -------> replace a pattern for another
p -------> p at the end, prints
g -------> g at the end, change all accurrences
Sed command is mostly used to replace the text in a file.
# Replacing or substituting string not in place:
sed 's/strig_to_find/new_string/' file.txt
# Replacing or substituting string in place:
sed -i 's/strig_to_find/new_string/' file.txt
# Replacing a string between lines 3 and 9:
sed '3,9s/strig_to_find/new_string/' file.txt
# Delete whole line with the string specified on the line
sed -i '/string/d' file.txt
# insert a string at the beginning of each line
sed 's/^/string/' file.txt
# replace all occurrences of the strings “marcos”, “luiz”, “karol” by “friends”
sed 's/marcos\|luiz\|karol/friends/g' file.txt
# Insert the string ‘#’ at the beggining of lines 1 to 8
sed -i '1,8s/^/#/' file.txt
# Substitute “foo” by “bar” onle at lines with “##”
sed '/##/s/foo/bar/g' arquivo.txt
# prints only the lines each starts with the string ‘http’
sed -n '/^http/p' file.txt
# print only the line 9
sed -n '9p' file.txt
# Print lines from line 6 to 9
sed -n '6,9p' file.txt
# Removing blank lines from a file
sed -i '/^$/d' file.txt
sed -n -e '10,1000p' input.txt > output.txt
# -n means don't print each line by default.
# -e means execute the next argument as a sed script.
# 10,1000p is the subseting, from line 10 until line 100 (inclusive)
# p iqual to print those lines
Awk is a pattern scanning and processing language for manipulating data reports. Awk allows us to use variables, numerif funcions, string functions, and logical operators.
With this utility is possible to write programs in form of statements that define text patterns to be searched. By default it reads standard input and writes standard output.
Useful to:
- Transform data
- Produce formated reports
- Performs condicional and loops
- Perform arithmetic and strings operations
Filds in awk
awk represent a line as a record and is represented as NR1, NR2, NR3 till length of file. And each record is splited in fild by whitespace and stored variable called $1, $2, $3 ... and $0 is a entire line.
Options:
-F or --field-separator=fs modify the fild of separator.
Syntax:
awk [options] "pattern {action}" input_file.ext > output_file
awk -F: '/pattern/ { print $0 }' input_file.ext > output_file
# In case below, no pattern was used, than the action will be aplied to all record/lines.
awk '{sum += $1 }; END { print sum }' input_file.ext > output_file
The 1 at the end of the program, is a condition (always true) with no action, so it executes the default action for every line, printing the line (which may have been modified by the previous action in braces or not).
awk '/^>/ {$0= "> new name"}1' teste.txt > teste.txt
awk '/^>/ {$0= "> new name"} {print $0}' teste.txt > teste.txt # this has the same effect.
Usage:
# Searches for a pattern in a file, and prints the line
awk "/pattern/ {print}" input_file
# Searches for a pattern in a file, and prints the fild 1 and 4
awk "/pattern/ {print $1, $4}" input_file
# Print the line if the fild 1 ==A and fild2 == T
awk '$1 == "A" && $2 =="T"' input_file
# Divide the fild1 by 4 and print it
awk '{x=$1/4; print x}' input_file
# If the file has a line 3, print the fild 1 and 3
awk '{if(NR==3) print $1, $2}' input_file
# If the file has a line 7, print a substring of all line, from 2 to end
awk '{if(NR==7) print substr($0, 2, 100)}' input_file
# declare a variable, and exceute a if statement
awk '{var=10; if(length($1) > var) print $0}' input_file
# rename a fasfa header
awk '/^>/ {$0= "> new name"}1' teste.fa > teste.txt
# the 1 is a pattern; it evaluates to true. The action is not specified, so the default action is invoked, which is {print}
# Print only one sequence from a multifasta file
awk 'BEGIN {RS=">"} /chr1/ {print ">"$0}' teste.fa > teste.txt
awk '{if (RS!=">") print $0}'
Explanation:
BEGIN ---------- This tells the awk to execute the immediately following code in curly brackets at the beginning.
{RS=”>”} ------- Record separator (Since every sequence starts with a “>” sign. This will separate the sequences records from the file)
/Chr2/ --------- keyword to search in the entire record
{print “>”$0} -- $0 is the current record (From “Chr1” to the entire sequence till next “>”).
Add “>” at the beginning to get the standard identitifer which is not included in $0.
Searches for sequences in a multifasta file, and retrieves thoses which sequences matches the ids/or full name in the a file with a list of ids.
-f/--pattern-file for a file of patterns
-n/--by-name for matching full name instead of just ID.
conda install -c bioconda seqkit
seqkit grep -n -f retrons_ids.txt sequences.fasta > retrons_sequences.fasta
seqkit grep -f retrons_ids.txt sequences.fasta > retrons_sequences.fasta
Description:
Compare sorted files FILE1 and FILE2 line by line.
Syntax:
comm [OPTION]... FILE1 FILE2
# With no options, produce three-column output:
# Column one contains lines unique to FILE1,
# column two contains lines unique to FILE2, and
# column three contains lines common to both files.
comm fileA.txt fileB.txt
# suppress column 1 (lines unique to FILE1)
comm -1 fileA.txt fileB.txt
# suppress column 2 (lines unique to FILE2)
comm -2 fileA.txt fileB.txt
# suppress column 3 (lines that appear in both files)
comm -3 fileA.txt fileB.txt
# Show the differences from two outputs
comm -3 <(zcat SRR11940311.1/*_1.fastq.gz | grep "@" | cut -d " " -f 1) <(zcat SRR11940311.1/*_2.fastq.gz | grep "@" | cut -d " " -f 1)
# Print only lines present in both file1 and file2.
comm -12 fileA.txt fileB.txt
# Command Substitution: $(...)
diff $(ls) $(ls)
# Process Substitution: <(...)
diff <(cat file1 | cut -f 1) <(cat file2 | cut -f 1)
This command modify files permissions, controling who can access files, search directories, and run scripts.
Usage: chmod [mode] [operator] [permission] file.ext
Mode ugoa:
These letters "ugoa" in the command chmod definy which of the user will have ther permissions modified.
u : Owner of the file (user);
g : Members of the group the file belongs (Groups);
o : any other user (others);
a : all user of the system (all).
If none is passed to the command, the system will uses all as default.
Operators:
With the command chmod, has to be used with a operator, to specify the kind of permission which will be given.
+ : Grants the permission. The permission is added to the existing permissions;
– : remove permissions specified;
= : Set a permission and remove others.
Values:
r : The read permission.
w : The write permission.
x : The execute permission.
Checking the permission
ls -l file.txt
output: I separated them to a better visualization:
- rwx rw- r-- 1 luiz luiz 6kb Jan 12 14:38 file.txt
d rwx rwx rw- 1 luiz luiz 6k Jan 12 14:38 data
where the first fild "-" means the file is not a directory, like in the seconda exemple, represented by "d".
The seconda fild "rwx", represent the permissions for the user;
The third fild "rw-", represents the permissions for others;
And the forth fild "r--", represents the permissions to all.
Setting a permissions
# removing permission to write and execute to group and others
chmod go-wx file.txt
# set only permission to read and write to group and others
chmod go=rw file.txt
# Add permission to execute to group and others
chmod go+x file.txt
sample="1 2 3 4"; for i in $sample; do echo "wellcome $i times"; done
first declare an array of string with type.
Then Iterate the string array using for loop.
The doble quote around ${StringArray[@]}, force the whitespaced variable to be read as a simgle variable.
StringArray=("Linux Mint" "Fedora" "Red Hat Linux" "Ubuntu" "Debian" ); for val in "${StringArray[@]}"; do echo $val; done
array=("seq1", "seq2", "seq3") for i in "${array[@]}"; do wget -N "ftp://ftp.patricbrc.org/genomes/$i/$i.fna"; done
# simple run
bash script.sh
# Run in second plan
bash script.sh &
# redirect both stdout and stderr to file.log.txt
bash script.sh >& script.log.txt
# to run the scrip withou hang up, redirect both stdout and stderr to file.log.txt and run in second plan
nohup sh script.sh >& script.log.txt &
This kind of running is required when we need to activate a conda env, where all the programs needed to run the script are.
I do recommend not use any kind of characters as (_ , - , /) in the script name.
At the top of the script, starts as follow:
#!/usr/bin/env bash # shabang sign - Tells the operating system which interpreter to use.
set -o errexit # This will exit, if any erro happens.
conda activate genomics # This will activate the env we need to execute the script, change for the one you have.
command 1
command 2
command 3
Before run any command, let's ussume we have a env called genomics, and there are all the programs needed to execute the following script.
# Note: this script starts as showed above.
# running in interactive mode.
bash -i script2.sh
# redirect both stdout and stderr to file.log.txt
bash -i script2.sh >& script2.log.txt
# Now, let's assume we don't wrote in the script to activate genomics
# We can open the terminal and do as follow:
conda activate genomics
./script2.sh >& script2.log.txt # This comand will run the script interacting if the current env.
.bashrc is a login scripts which load a user's personal preferences. Upon creation of their account, a .bashrc file with default settings is copied into a user's home directory. The user can then modify that file to customize their session. They can modify environment variables, load modules, create aliases and activate Python virtual environments. Users can also modify .bashrc to change aesthetic things like the terminal color scheme and what their bash prompt displays. To edit your .bashrc file use a command line editor like vim or nano:
echo "alias open='wslview'" >> ~/.bashrc
WE can edit the .bashrc too from nano.
nano ~/.bashrc
# with the file opened:
"alias open='wslview'"
source ~/.bashrc
unalias alias_name