Skip to content

Latest commit

 

History

History
432 lines (301 loc) · 11.4 KB

3.1-bash_Unix_commands-Basics.md

File metadata and controls

432 lines (301 loc) · 11.4 KB

UNIX commands used in bioinformatics

Autor: Luiz Carlos
Data: 31/12/2021

Bash is a Unix shell and command language written by Brian Fox for the GNU Project as a free software replacement for the Bourne shell. First released in 1989, it has been used as the default login shell for most Linux distributions. A version is also available for Windows, via the Windows Subsystem for Linux.¹

Basic commands line used

sudo                # super user do
sudo apt update     # to update
sudo apt update && sudo apt upgrade
apt search 'app name'

pwd                 # Currently working directory
history             # list of command used. It's possible to use arrow down and up to show the last 100 comands.
clear               # clear command prompt
cat filename.ext    # to show or execute the file

## Disck space and memory usage
df 
df -h .         # for human size readble
df -h -m .      # for human readble in megabyties
du -sh          # size of pwd
du -sh dir_name

htop or top         # to get memory and cpu usage info. Press q to exit
lscpu               # to get system info, like total memory, number of cpus and threads, architecture, model of the processor etc.

List the content of a directory

# list everything in pwd
ls

# list the content of a specific directory
ls dir_name

# ls command with flags
ls [flags] [directory]

ls -d */     # to list only directories
ls *         # to list the contents of the directory with it's subdirectories:
ls -R        # to list all files and directories with their corresponding subdirectories down to the last file:
ls -s        # to list files or directories with their sizes:
ls -l        # to list the contents of the directory in a table format
ls -lh       # to list the files or directories in a human readable table format
ls -a        # to list files or directories including hidden files or directories 
ls -t        # to list files or directories and sort by last modified date and time in descending order
ls -tr       # the same as above in reverse order
ls -S        # (the S is uppercase) command to list files or directories and sort by date or time in descending order (biggest to smallest).

Create a directory

Create the DIRECTORY(ies), if they do not already exist.

Usage:

mkdir [OPTION]... DIRECTORY Nane...

Mandatory arguments to long options are mandatory for short options too:

-m, --mode=MODE   # set file mode (as in chmod), not a=rwx - umask
-p, --parents     # no error if the diretory already exist, make parent directories as needed
-v, --verbose     # print a message for each created directory

Remove files and directories

rmdir dir_name # remove directory if its empty
rm -r dir_name # remove all directory and everything in it.
rm -l dir_name # remove but prompt which file should be removed 
rm -v dir_name # verbose what is going on
rm -i filename # this will ask if we sure we want to delete

Change directory

cd          # move to home directory
cd /        # move root directory
cd dir_name # move a specified directory
cd -        # move one directory up
cd ..       # move one directory up
cd ../../   # move two directory up

Copy Files and Directories

Common options used with the cp command, include:

-a – archive, never follow symbolic links, preserve links, copy directories recursively  
-f – if an existing destination file cannot be opened, remove it and try again  
-i – prompt before overwriting an existing file  
-r – copy directories recursively  
# Copying a single file to a destination directory
cp data.txt /var/tmp/

# Copying multiple files to a destination directory
cp data.txt file.csv /var/tmp/

# Copying a directory (and it’s contents) recursively
cp -r /etc/ /var/tmp/backup/

Move Files and Directories

The mv command will move or rename files or directories, or can move multiple sources (files and directories) to a destination directory.

The basic syntax of the mv command is:

mv [options] source destination

To move multiple files/directories into a destination

mv [options] source1 source2 [...] destination

Common options used with the mv command:

-f – do not prompt before overwriting  
-i – prompt before overwrite  
-u – move only when the source file is newer than the destination file or when the destination file is missing  

Note 1: that if the destination exists, it will be overwritten unless the -i option is used.

Note 2: If a file or directory is moved to a new name within the same directory, it is effectively renamed.

Rename a file

mv -i oldname newname

Creating files

# create empty files
touch filename.ext 

# create a empty file in a specific directory
touch /home/doc.txt 

Writting in files

echo msg to be written file.ext

Search for files

find -name dir_or_filename

Redirect outputs from command line into a file - Part 1

# Overwriting the file or creating a new one
command > file.txt

# Appending into a file
command >> file.txt 

Visualize files - Part 1

head and tail

head file_name.ext
tail file_name.ext

# With the option -n plus a number specifies the number of lines to be shown.
head -n10 file_name.ext

sort a file

Write sorted concatenation of all FILE(s) to standard output.

With no FILE, or when FILE is -, read standard input.

# Defualt alphabetically
sort filename

# Sort by column 2
sort -k 2 filename

# sort by column numerically
sort -k 2n filename

# sort reversely and numerically 
sort -k 2rn filename

# sort bby multiple columns
sort -k 2 -k 3 filename

# sort uniquely
sort -u filename

# sort in reverse order
sort -r filename              

Return unique lines in a file with UNIQ

Return unique lines, if consecutive occurances are found in sequence

Usage:

uniq filename

# counts the consecutive occurences
uniq -c filename

WC - Word Count in a file

Description:
Counts the number of lines, words and characteres of a file.

options:

-c : counts bytes.  
-l : counts lines.  
-L : show the lenght of longest line.
-m : counts characteres.
-w : counts words.

usage:

wc [option] [file]  
wc -l file

Subset files with CUT

cut is a command-line utility that allows you to cut parts of lines from specified files or piped data and print the result to standard output. It can be used to cut parts of a line by delimiter, byte position, and character.

Options:

-f (--fields = LIST)______ # Select by specifying a field, a set of fields, or a range of fields.  
-d (--delimiter)__________ # Specify a delimiter that will be used instead of the default “TAB” delimiter..  
-c (--characters = LIST)__ # Select by specifying a character, a set of characters, or a range of characters..  

The LIST argument passed to the -f, -b, and -c options can be an integer, multiple integers separated by commas, a range of integers or multiple integer ranges separated by commas.

The syntax for the cut command:

# default delimiter tab
cut -f 1,7-10 featurecounts.txt > counts.txt

# specifying a new delimiter
cut -d ' ' -f 1,7-10 featurecounts.txt > counts.txt

Downloadinf files and data with wget

Wget is a command-line utility for downloading files from the web. With Wget, you can download files using HTTP, HTTPS, and FTP protocols. Wget provides a number of options allowing you to download multiple files, resume downloads, limit the bandwidth, recursive downloads, download in the background, mirror a website, and more.

Options:

-nv ------> non verbose
-O -------> Saving the downloaded file under different name 
-P -------> Download the file into a specific directory
-c -------> Resuming a download from the previous one
-b -------> Downloading in background
-i -------> Downloading multiple files (this options need to be followed by the path to a local or external file containing a list of the 
URLs to be downloaded. Each URL needs to be on a separate line).

Downloading files from internet

wget -o /mnt/c/file.txt 'link to download the file'

Download a fasta file from NCBI into the PC with (eutils):

wget "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=NC_045512.2&rettype=fasta" -O covid_genome.fa 

Download a list of fastq files:

wget -i urls_fastq_files.txt

Compress and Uncompress files

Option:

-c, --stdout    = compress but, write on standard output, keep original files unchanged  
-k, --keep       = keep (don't delete) input files  
-d, --decompress = decompress  
-t, --test       = test compressed file integrity  
-v, --verbose    = verbose mode  
-l, --list       = list compressed file contents  

Usage:

# To compress
gzip --keep file_name.ext

# Uncompress files
gunzip -k file_name
gzip -d file_name


# to compress and decompress using bcftools util
bgzip -c file_name.ext > filename.ext.gz 

bunzip file_name.ext.gz

Group and compress multiple files with TAR

Descrition:
tar (Tape ARchiving) group/ungroup files.

Sintax:
tar [opções] grouped_name.tar [files]

Options command:

-c : creates a new tar file.  
-t : list the content of tar file.  
-x : extract the content of tar.  
-v : "verbose" show mensagens of what is happening.  
-f file : defines the name of the tar file.  
-z ou −−gzip ou −−gunzip : compress/uncompress files using gzip/gunzip.  
-j ou −−bzip2 : compress/uncompress using bzip2.
-?, −−help : mostra as opções do comando.  
# To creates a file colection_txt with group all txt files in the pwd.
tar -cvf colection_txt.tar *.txt

# To verify the content of the archive colection_txt
tar -tvf colection_txt.tar

# to extract all the file from colection_txt
tar -xvf colection_txt.tar

# to extract a specific file from colection_txt 
tar -xvf colection_txt.tar filename.ext


### Creating file grouped and compressed:

# To compress the files grouped by tar, we can use gzip
tar -czvf colection.tar *.txt

# to compress using bzip2 with starts with test
tar -cjvf colection2.tar *.txt

Shortcuts

ctrl + l         # clear cmd
ctrl + d         # quit()
ctrl + c         # interrupt
ctrl + z         # interrupt abruptly
ctrl + a         # move to the start of line
ctrl + e         # move to the end of line
ctrl + shift + c # copy
ctrl + shift + v # paste
right click of mouse to paste
jupyter notebook --no-browser
'dir name'      # use of '' to acess dir with spaces in the name
up and down arrows to show previewly command used

Utilities for bioinformatics

TREE

Descrição:

Este utilitário lista o conteúdo de um diretório usando o formato de árvore. Ele tem a mesma função do comando ls. A diferença consiste na maneira como as informações são exibidas.

Installation:

sudo apt install tree

Algumas opções do comando:

-a : lista todos os arquivos, inclusive os arquivos ocultos.  
-d : lista somente os subdiretórios.  
-f : exibe o caminho completo dos arquivos.  
-p : exibe as permissões dos arquivos.  
−−help : exibe as opções do utilitário.  
−−version : mostra informações sobre o utilitário.  
tree -d # lista os diretórios do pwd
tree -d dir_name # lista o diretório especificado
tree -a