Skip to content

Browsing files

Andrea Telatin edited this page Apr 8, 2020 · 10 revisions

Listing files

The general syntax is: ls [options] [files]. Both the options and the files are optional, and files can be files or directories. Now we introduce some of the options: Option Description

  • -a Also show hidden files
  • -l Long format, will show one file per line, with size, owner, date…
  • -h Used with -l, will display file size in human-readable format (e.g. 2.3Mb instead of 2298011 )
  • -d Show directories as files, without listing their content

The options can be combined together, and the following two commands are identical:

ls -l -h -a
ls -lha

If we want to list the files present at the root, we don't need to move there, but simply ask ls which path to scan for you:

ls /

Here another example:

ls /homes/qi/tutorial/

You can type as many paths (files or directories) as needed in a single ls command:

ls -l ~/.bashrc ~/.screenrc /homes/qi/tutorial/

Using the "shell expansion": wildcards to select multiple files

As we noticed, ls can receive more than one file. Usually, though, we don't type every single item to be listed, but instead we use wildcards, then the shell will expand our shortcuts into a list of paths. There are wildcards, ranges and lists to be used.

Symbol Meaning Example
* Any set of characters (any length) *.fasta: all files ending with “.fasta”
? A single character A???.txt: files starting with A, followed by exactly 3 chars, endin by “.txt”
[a-z] Range: any single lowercase letters file1[a-c].txt: files called file1a, file1b and file1c, ending with “.txt”
[0-9] Range, any single digit reads_R[1-2].fastq: reads_R1.fastq and reads_R2.fastq
{a,b} Comma separated list of words photo_{andrea,john}.jpg: photo_andrea.jpg and photo_john.jpg

Getting toy files (:bulb: if you didn't download them before)

This course comes with a structure of directories and test files. To download it you will need an Internet connection (in the machine you are logged into, so if you use a cluster you might need to go to a net-enabled node).

cd 
git clone https://github.com/telatin/learn_bash

This command will download the latest version of "learn_bash". Since we first used the cd command to return to our home directory, we should have a ~/learn_bash/ directory in our account now.

We should be in our home directory. Check with pwd.

To enter the new directory, type (remember the TAB):

cd learn_bash

Now, using cd and ls try figuring out:

  • How many directories are inside the examples directory
  • The content of each directory

Creating a directory, coping some files

Create a directory called copies inside the examples directory. There are many ways: ⚠️ if you are already inside “learn_bash”, just:

mkdir copies

Otherwise, you have to craft the proper relative or absolute path (e.g. the absolute path is mkdir ~/learn_bash/copies).

Let's try again to copy some files. In particular, we want a selection of files inside the phage directory:

# If we are not inside the examples directory:
cd ~/learn_bash/
# Copy some files
cp -v phage/*.f?? copies/

In this case, we use a new switch, -v (verbose) that will print all the files copied (useful when we want to see the progress). Using both * and ? wildcards we select all the files having an extension of three chars, the first being “f” (e.g. fna, faa).

Comments

In bash if we type text after a # it is ignored. I will use this feature to explain some commands like:

# The following line will list the files in your home
ls -l ~

Find

The find command can print all the files from a starting path, including directories and subdirectories.

Some examples:

# Print all files and directories in my home
find ~
 
# Print all files and directories in a specific path
find /usr/lib/ssl
 
# Print only directories / files
find ~ -type d
find ~ -type f
 
# Print files in a home with a specific extension
find ~ -name "*.txt"

Viewing text files

The simplest command is cat (concatenate), that can print the content of one or more files. Example:

cat ~/learn_bash/files/wine.csv
  • Can you type it using a relative path?

When a file is huge, it's very convenient to have a look at a fraction of it. The commands head and tail allows printing only the first (or last) lines of a file. By default 10 lines, but you can change this with -n:

head ~/learn_bash/files/wine.csv
head -n 3 ~/learn_bash/files/wine.csv
tail -n 5 ~/learn_bash/files/wine.csv

Do you remember man? Good, as we can now use a new command to interactively view text files that will behave as "man":

# Run it, then press 'q' to exit:
less ~/learn_bash/phage/vir_genomic.gff
 
# To disable wordwrap and see clearly the lines:
less -S ~/learn_bash/phage/vir_genomic.gff

Counting lines

Counting the number of lines of a file is a common task. The wc (wordcount) command can do this, and something more.

# Count lines, words, characters of a file:
wc ~/learn_bash/files/introduction.txt
 
# Count only lines:
wc -l ~/learn_bash/files/introduction.txt
 
# Also on multiple files
wc -l ~/learn_bash/phage/*.*

Extracting matching lines

grep is a powerful command to extract lines containing a pattern. The simples use is “grep wordtosearch file”:

grep ">" ~/learn_bash/phage/vir_protein.faa

In this case, the word we looked for is simply the > character, that is, we extracted all the lines containing it. We are not going to expand this, but you can perform complex searches using a language called regular expressions. Some switches: -c to count the number of matching lines, -i to perform a case insensitive search, -v to print the lines not containing the pattern.

See Presentation on regular expressions for grep

Redirecting the output

Create a ~/day2/ directory.

So far, every command we issues gave us some text lines that we inspected, but we never saved them for long term storage. Consider the following command:

find ~/learn_bash -type d

If we want to save the output in a new file, the shell offers us a redirection symbol:

find ~/learn_bash -type d > ~/day2/directories.txt

With this command, we created a new file called ~/day2/directories.txt, where the output of find was stored.

⚠️ Note that if the file was already present, it would have been overwritten!

Output streams

Our commands print two types of text.

We explained the behaviour of most commands as a set of characters printed on our screen. This is a simplification: the characters printed can be either "real output" or user "messages" (technically called standard output and standard error). The '>' sign will redirect the standard output (or STDOUT), but sometimes we are interested in the standard error (or STDERR). Try:

ls -l ~/.bashrc ~/.404

What can you note?

ls -l ~/.bashrc ~/.404 2> ~/ls_demo.err

Now you know how to redirect the standard error (i.e. using 2>).

Let's make a real-world example: when we align short reads against the reference we expect the output to be the alignments (in SAM/BAM format), but the program can be interested in printing some user information (e.g. alignment progress, how many unmapped reads…), so will use the standard error.

Try the paths

Go to your home directory. Try counting the lines from two files you choose inside your home, plus /etc/passwd.

Now count the lines of /etc/passwd, but using a relative path!

Go to the ~/learn_bash/scripts/ and try to list the files included in the ~/learn_bash/files, using the relative path.

Finally, always from the ~/learn_bash/scripts/ directory. Save into a file called phage_files_lines.txt placed inside your home the number of lines of each file inside the "learn_bash/phage" directory. Use only relative paths.

A trick

In our training server I installed a library that will colorize (in yellow), the STDERR, leaving unchanged the STOUT stream. To enable it:

source /homes/lib/load_stderred

Try now:

ls -l ~/.bashrc ~/.404

See also