# Accessing NCBI Databases with BioPython
(Víctor Sojo | vsojo@amnh.org)

Here we will be introducing **BioPython** and using it to interface with NCBI databases.

**References:**
+ The [_BioPython tutorial_](http://biopython.org/DIST/docs/tutorial/Tutorial.html).
+ Tiago Antao's book [_Bioinformatics with Python Cookbook_](https://www.packtpub.com/product/bioinformatics-with-python-cookbook-second-edition/9781789344691), which I highly recommend. You may be able to access it for free from your library (e.g., Columbia University has it in digital).

## Contents
&emsp;[Installing BioPython \(if you don't have it\)](#Installing-BioPython-\(if-you-don't-have-it\))<br/>
&emsp;[Importing necessary BioPython modules](#Importing-necessary-BioPython-modules)<br/>
&emsp;[Accessing NCBI via Entrez](#Accessing-NCBI-via-Entrez)<br/>
&emsp;&emsp;[Getting the list of available databases at NCBI](#Getting-the-list-of-available-databases-at-NCBI)<br/>
&emsp;[Download nucleotide records for a specific gene, by name, in a specific species](#Download-nucleotide-records-for-a-specific-gene,-by-name,-in-a-specific-species)<br/>
&emsp;[Downloading gene sequences given a list of IDs](#Downloading-gene-sequences-given-a-list-of-IDs)<br/>
&emsp;&emsp;[⚠️ Iterators are emptied after completing a single run ⚠️](#⚠️-Iterators-are-emptied-after-completing-a-single-run-⚠️)<br/>
&emsp;&emsp;[To keep data from an iterator permanently, convert it to a list \(inefficient\)](#To-keep-data-from-an-iterator-permanently,-convert-it-to-a-list-\(inefficient\))<br/>
&emsp;[Saving sequence records to files with Bio.SeqIO.write\(\)](#Saving-sequence-records-to-files-with-Bio.SeqIO.write\(\))<br/>
&emsp;[Accessing the NCBI Taxonomy database](#Accessing-the-NCBI-Taxonomy-database)<br/>
&emsp;&emsp;[Searching broadly using Entrez.esearch\(\)](#Searching-broadly-using-Entrez.esearch\(\))<br/>
&emsp;&emsp;[Fetching specific taxonomic records using IDs in Entrez.efetch\(\)](#Fetching-specific-taxonomic-records-using-IDs-in-Entrez.efetch\(\))<br/>

⚠️ I'm assuming you followed the Py201 notebook, in which we installed `conda`, set up an environment called `bioinfo`, and considered differences for Windows users ⚠️

First, let's make sure that you're using the appropriate environment:

In [1]:
! echo $CONDA_DEFAULT_ENV

bioinfo


    Looking good for me. If you don't see `bioinfo` but instead `base` or `root`, you should probably close this notebook, stop jupyter by hitting **Quit** at the top-right of the main Jupyter browser webpage (or by issuing `jupyter notebook stop` on a new terminal tab or hitting Ctrl+C twice in the Terminal tab running Jupyter), then load your `conda` environment from the terminal/Anaconda prompt:
```bash
conda activate bioinfo
```
and restart Jupyter:
```bash
jupyter notebook
```

⚠️ **Windows users:** ⚠️ Every time I write `! something` you should instead write **`!wsl something`**. For this to work, you need to have the Windows Subsystem for Linux (WSL) activated. You need to do that otherwise a significant part of the code in this workshop won't work for you. [Learn about the WSL here](https://docs.microsoft.com/en-us/windows/wsl/install-win10).

## Installing BioPython (if you don't have it)
You should have installed BioPython previously. If you haven't, you can do it inside this Jupyter Notebook itself:

In [2]:
! conda install biopython -y

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.



In my case, I already have `biopython`, so `conda` just searches for updates and then lets me know that I do have the requested packages installed. It will probably take a couple of minutes if you don't have it installed. 

## Importing necessary BioPython modules

In [3]:
from Bio import Entrez, SeqIO

BioPython is enormous, and we don't need all of it here. For this lesson, we only need the following modules:

Module      | Use
:-----------|:-----------------------------------------
**Entrez**  | Programming interface to retrieve data from NCBI
**SeqIO**   | To read and write bio-sequences

## Accessing NCBI via Entrez
**Entrez** is NCBI's communication portal, so we will be using it here to download information from there.

⚠️ **Important** ⚠️ Always inform NCBI of your email address!

In [4]:
Entrez.email = 'vsojo@amnh.org'

### Getting the list of available databases at NCBI
NCBI hosts a gigantic repository with multiple databases. Let's start by downloading the list of those databases:

In [5]:
info_handle = Entrez.einfo() # Use Entrez to get the information of NCBI databases into a result handle
rec = Entrez.read(info_handle) # Read that handle into an actual BioPython record
print(rec) # Print the record

{'DbList': ['pubmed', 'protein', 'nuccore', 'ipg', 'nucleotide', 'structure', 'genome', 'annotinfo', 'assembly', 'bioproject', 'biosample', 'blastdbinfo', 'books', 'cdd', 'clinvar', 'gap', 'gapplus', 'grasp', 'dbvar', 'gene', 'gds', 'geoprofiles', 'homologene', 'medgen', 'mesh', 'ncbisearch', 'nlmcatalog', 'omim', 'orgtrack', 'pmc', 'popset', 'proteinclusters', 'pcassay', 'protfam', 'biosystems', 'pccompound', 'pcsubstance', 'seqannot', 'snp', 'sra', 'taxonomy', 'biocollections', 'gtr']}


You will probably recognise some of those names from your previous work or studies (e.g., `pubmed`, `nucleotide`, `protein`, `taxonomy`), but most of the others are probably unfamiliar.

Let's look for a gene in the `nucleotide` database next.

## Download `nucleotide` records for a specific gene, by name, in a specific species
We will download the `nucleotide` record _**references**_ that exactly match the `[Gene Name]` `"12S rRNA"` (this is the RNA of the small-subunit of the mitochondrial ribosome). We will do so specifying the red panda (_Ailurus fulgens_) as the `[Organism]` of interest here... because red pandas are great.

In [6]:
# Perform a web search on the NCBI nucleotide database, for a gene in an organism:
ref_handle = Entrez.esearch(db='nucleotide', term='"12S rRNA"[Gene Name] AND "Ailurus fulgens"[Organism]')

Let's take a look at that search result stored in the `handle` variable:

In [7]:
print(ref_handle)

<http.client.HTTPResponse object at 0x10654bbb0>


... well, that's not particularly helpful. This `ref_handle` variable holds the raw result of the communication to the NCBI database, including any actual results returned. To make use of these results (if any), we need to "parse" the handle into an actual record – as we did above when we got the list of databases. Once again we use the `.read()` method for this:

In [8]:
# Read the raw web-search results into a usable BioPython record variable
ref_recs = Entrez.read(ref_handle)

Let's take a proper look at the results variable:

In [9]:
print(ref_recs)

{'Count': '4', 'RetMax': '4', 'RetStart': '0', 'IdList': ['195934696', '155368555', '1871550', '417984'], 'TranslationSet': [{'From': '"Ailurus fulgens"[Organism]', 'To': '"Ailurus fulgens"[Organism]'}], 'TranslationStack': [{'Term': '"12S rRNA"[Gene Name]', 'Field': 'Gene Name', 'Count': '7977', 'Explode': 'N'}, {'Term': '"Ailurus fulgens"[Organism]', 'Field': 'Organism', 'Count': '425', 'Explode': 'Y'}, 'AND'], 'QueryTranslation': '"12S rRNA"[Gene Name] AND "Ailurus fulgens"[Organism]'}


The `ref_recs` variable seems to be a `dict`. I can never make sense of dictionaries when they are printed like this. Let's use the `dict.items()` method and the mighty f-strings to print it all more legibly:

In [10]:
for key, val in ref_recs.items():
  print(f"-{key:20s}: {val}") # the 20s means "it's a string, and I want it to be 20 characters long"

-Count               : 4
-RetMax              : 4
-RetStart            : 0
-IdList              : ['195934696', '155368555', '1871550', '417984']
-TranslationSet      : [{'From': '"Ailurus fulgens"[Organism]', 'To': '"Ailurus fulgens"[Organism]'}]
-TranslationStack    : [{'Term': '"12S rRNA"[Gene Name]', 'Field': 'Gene Name', 'Count': '7977', 'Explode': 'N'}, {'Term': '"Ailurus fulgens"[Organism]', 'Field': 'Organism', 'Count': '425', 'Explode': 'Y'}, 'AND']
-QueryTranslation    : "12S rRNA"[Gene Name] AND "Ailurus fulgens"[Organism]


OK, that's much clearer. This `dict` contains the `Count` of records downloaded and the list of gene IDs, `IdList` , amongst other information.

An interesting one is `RetMax`, which here is the same as `Count` (`4`). The standard search is limited to 20 results by default. In our case, we only found 4 anyway, and this is why `RetMax` is `4`. But if you found and want to retrieve more than 20 results, you can specify e.g. `RetMax=150` as an additional parameter to `Entrez.esearch()` to get `150` records. To retrieve _all_ records, you could do `RetMax=recs['Count']`, which will get everything. However, be careful with that, since you may end up downloading a lot of data further down the line. At this point we're only downloading record IDs, so it's fine to download any number, but in the following section we will download actual nucleotide records (which can be very heavy).

In any case, we can use this dictionary to provide some useful output for the users (ourselves), for example:

In [11]:
# Print some information about how many records were found
print(f"There are {ref_recs['Count']} nucleotide records matching the search pattern on NCBI. {len(ref_recs['IdList'])} of these records were retrieved.")

print("Here are the record Ids retrieved:")
for rec_id in ref_recs['IdList']:
  print(rec_id)

There are 4 nucleotide records matching the search pattern on NCBI. 4 of these records were retrieved.
Here are the record Ids retrieved:
195934696
155368555
1871550
417984


Nice. This is useful information, but we only have the record IDs, not the actual gene sequences and info. Let's get that next.

## Downloading gene sequences given a list of IDs
So far, we have only downloaded the record _references_, not the gene records themselves. But we have the gene IDs, so we can go ahead and download the full gene records using those IDs.

We only have 4 results here, so this should be a trivial download. However, if you have a lot of results (because you changed `RetMax` above), you'll need to be careful about what you download and how. You may need to split in batches of a hundred or so.

Whatever you do, though, **don't download the records one by one** (e.g. with a `for` loop); it's inefficient and very bad Internet citizenship. Also, NCBI doesn't like it and may block your IP address (a good reason to give them your email address, just in case they want to talk to you to check you're not a bot).

We have just 4, so let's give the full list to `Entrez`:

In [12]:
gene_handle = Entrez.efetch(db='nucleotide', id=ref_recs['IdList'], rettype='gb')

Note that in the previous section we used `Entrez.esearch()` because we weren't sure exactly what we wanted. Here we're using `Entrez.efetch()` instead, because we know exactly which records we want.

Also, we specified the `gb` (GenBank) format, which has plenty of information. We could have used `fasta` instead, but `gb` is very good for exploring.

Let's parse that handle and take a look at the records we retrieved:

In [13]:
gene_recs = SeqIO.parse(gene_handle, 'gb')
print(gene_recs)

<Bio.SeqIO.InsdcIO.GenBankIterator object at 0x19b9dc100>


This also doesn't look very helpful, but in this case it actually is. As you can see, it's an `iterator` (read [here](https://www.w3schools.com/python/python_iterators.asp) if you want to know more), so we can go over it with a `for` loop:

In [14]:
for i, rec in enumerate(gene_recs):
  print(f"\n\n# # # RECORD {i} # # #")
  print(rec)



# # # RECORD 0 # # #
ID: NC_011124.1
Name: NC_011124
Description: Ailurus fulgens mitochondrion, complete genome
Database cross-references: Project:30903, BioProject:PRJNA30903
Number of features: 77
/molecule_type=DNA
/topology=circular
/data_file_division=MAM
/date=14-APR-2009
/accessions=['NC_011124']
/sequence_version=1
/keywords=['RefSeq']
/source=mitochondrion Ailurus fulgens (lesser panda)
/organism=Ailurus fulgens
/taxonomy=['Eukaryota', 'Metazoa', 'Chordata', 'Craniata', 'Vertebrata', 'Euteleostomi', 'Mammalia', 'Eutheria', 'Laurasiatheria', 'Carnivora', 'Caniformia', 'Ailuridae', 'Ailurus']
/references=[Reference(title='Mitogenomic analyses of caniform relationships', ...), Reference(title='Direct Submission', ...), Reference(title='Direct Submission', ...)]
/comment=REVIEWED REFSEQ: This record has been curated by NCBI staff. The
reference sequence was derived from AM711897.
COMPLETENESS: full length.
Seq('GTTAATGTAGCTTAATAAATAAAGCAAGGCACTGAAAATGCCTAGATGAGTTAC...TCT')


# 

---
Take a look at those results to make sure they make sense. Easily done because it's only `4`, but if they were thousands, you may need to parse through the names in some automated way (e.g. looking for desired terms such as `"12S"` and `"ribosomal"` in the `Description` and discarding any records that don't contain them).

### ⚠️ Iterators are emptied after completing a single run ⚠️
Let's try to run that last `for` loop again:

In [15]:
for i, rec in enumerate(gene_recs):
  print(f"\n\n# # # RECORD {i} # # #")
  print(rec)

This time we get no output. This is because we had an `iterator`, which goes over each of its items only once, getting rid of the item as it produces it. This is **extremely efficient**, which is why Python insists that you use them whenever you can. However, it has limitations too, chiefly that you have no way of recovering the data once you've read it. In our case, this means we're forced to go back to NCBI again:

In [16]:
gene_handle = Entrez.efetch(db='nucleotide', id=ref_recs['IdList'], rettype='gb')
gene_recs = SeqIO.parse(gene_handle, 'gb')
print(gene_recs)

<Bio.SeqIO.InsdcIO.GenBankIterator object at 0x19b9dca90>


Good, we have our fresh iterator again. But every time we read something from it, each item we read is lost forever, until the iterator is empty. So, what do we do if we don't want to lose that data just by looking at it? One solution – not necessarily the best solution – is to convert the `iterator` to a `list`. 

### To permanently keep the data from an iterator, you _could_ convert it to a `list` (inefficient)
If we don't want to keep the data in our iterator more permanently in memory, we need to convert it to a `list`, which is an `iterable`. The slight difference in name means you can go over lists as many times as you want.

In [17]:
gene_recs = list(gene_recs)

Now we can run the following `for` loop as many times as we wish:

In [18]:
for i, rec in enumerate(gene_recs):
  print(f"\n\n# # # RECORD {i} # # #")
  print(rec)



# # # RECORD 0 # # #
ID: NC_011124.1
Name: NC_011124
Description: Ailurus fulgens mitochondrion, complete genome
Database cross-references: Project:30903, BioProject:PRJNA30903
Number of features: 77
/molecule_type=DNA
/topology=circular
/data_file_division=MAM
/date=14-APR-2009
/accessions=['NC_011124']
/sequence_version=1
/keywords=['RefSeq']
/source=mitochondrion Ailurus fulgens (lesser panda)
/organism=Ailurus fulgens
/taxonomy=['Eukaryota', 'Metazoa', 'Chordata', 'Craniata', 'Vertebrata', 'Euteleostomi', 'Mammalia', 'Eutheria', 'Laurasiatheria', 'Carnivora', 'Caniformia', 'Ailuridae', 'Ailurus']
/references=[Reference(title='Mitogenomic analyses of caniform relationships', ...), Reference(title='Direct Submission', ...), Reference(title='Direct Submission', ...)]
/comment=REVIEWED REFSEQ: This record has been curated by NCBI staff. The
reference sequence was derived from AM711897.
COMPLETENESS: full length.
Seq('GTTAATGTAGCTTAATAAATAAAGCAAGGCACTGAAAATGCCTAGATGAGTTAC...TCT')


# 

---
Go ahead and re-run that last `for` loop. You'll see that this time the data is still there.

⚠️ **Warning** ⚠️ This code above seems great, but it can be a very bad idea if you have a lot of data. `list`s are a lot less efficient than `iterator`s, because the entire dataset needs to be loaded into memory with a list, as opposed to one item at a time with an iterator.

Here we have just `4` items, so that's not a problem, but if you have a lot of data, you'll want to keep it as an iterator, even if that means you can only look at it once. Just do look at it only once, do what you need to do with/to it, and move on to the next item. That's the properly _pythonic_ thing to do, more so with huge bioinformatics data.

The advantage of a list is of course that we can reuse the data as many times as we wish without having to go back to NCBI. Here, since we're just learning, and since it's only 4 items, turning an iterator to a `list` is acceptable. You'll find yourself turning iterators to lists many times, particularly with light data, and also with portions of your data as you're developing your code (since you need to explore the data and typically don't know what you'll find upon the first read).

But there's an even better way that lets us keep the data permanently: exporting it to a file. We explore that next.

## Saving sequence records to files with `Bio.SeqIO.write()`
If you wish to keep data such as a sequence record permanently, you can export it to a file, in some desired format. You can then load this data back in as needed. Here we will export our records to GenBank format.

First, let's use the shell to create a folder to hold the `GenBank` files:

In [19]:
gb_dir = 'GenBank' # We define a variable so that we can reuse it later in this script
! mkdir $gb_dir

mkdir: GenBank: File exists


(In my case, I already had that directory, so bash gives me a minor complaint)

We still have the records in the `gene_recs` list, so we could export them easily from there. However, in a real analysis, the optimal way to do this would be to create the files directly as you read them from the NCBI record. So, with apologies to NCBI and its worldwide users, we will go back there a third time.

In [20]:
gene_handle = Entrez.efetch(db='nucleotide', id=ref_recs['IdList'], rettype='gb')
gene_recs = SeqIO.parse(gene_handle, 'gb')
print(gene_recs)

<Bio.SeqIO.InsdcIO.GenBankIterator object at 0x19ba4a6a0>


We have our iterator again. But this time we will neither waste it by printing, nor turn it inefficiently into a list. Instead, we will do the proper thing and save each of the records to its own file.

Importantly, we would benefit from some automated way to access these files later, so let's store them into a list for now.

In [21]:
gb_file_names = [] # An empty list that will hold the file names

for rec in gene_recs:
  # Define a file name using the directory we just created, and the record's own ID:
  file_name = f'{gb_dir}/{rec.id}.gb'
  # Save the file using that file name
  SeqIO.write(rec, file_name, format='gb')

  # We also want the list of files, so let's store them:
  gb_file_names.append(file_name)

  # And print some useful information to screen, just so that we know what's going on:
  print(f"Saving record\t{rec.id}\tto file\t{file_name}")

Saving record	NC_011124.1	to file	GenBank/NC_011124.1.gb
Saving record	AM711897.1	to file	GenBank/AM711897.1.gb
Saving record	Y08511.1	to file	GenBank/Y08511.1.gb
Saving record	L21885.1	to file	GenBank/L21885.1.gb


Here, we exported each of the records verbatim to a file, and we used the corresponding record's ID (`rec.id`) to name its file.

(Note that the record IDs don't match the gene IDs that we had from before. NCBI uses multiple identifiers for sequences. Right now it doesn't matter much to us, but for real bioinformatics work, you'll need to be very careful with which identifier you are using)

Let's take a quick look at one of those files:

In [22]:
! head GenBank/L21885.1.gb

LOCUS       AISMTRG12S               349 bp    DNA     linear   MAM 18-NOV-1993
DEFINITION  Ailurus fulgens mitochondrial 12S ribosomal RNA (12S rRNA) gene
            fragment.
ACCESSION   L21885
VERSION     L21885.1
KEYWORDS    12S ribosomal RNA.
SOURCE      mitochondrion Ailurus fulgens (lesser panda)
  ORGANISM  Ailurus fulgens
            Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
            Mammalia; Eutheria; Laurasiatheria; Carnivora; Caniformia;


Perfect! I suggest you also open this GenBank file externally to explore it properly (for example, you could use the JupyterLab browser).

#### Create an index file with the name and location of each of the GenBank, to be accessed later
We will create a file containing the list of names of all the GenBank files that we created above, so that we can find them easily in our later analyses:

In [23]:
gb_file_list = 'gb_files.list'

with open(gb_file_list, 'w') as f:
  for file_name in gb_file_names:
    f.write(f"{file_name}\n")

Let's take a look:

In [24]:
! cat $gb_file_list

GenBank/NC_011124.1.gb
GenBank/AM711897.1.gb
GenBank/Y08511.1.gb
GenBank/L21885.1.gb


We will use this list of files in the following lesson, when we will be looking into reading into those files to find the actual DNA sequences of the 12S rRNA gene.

## Accessing the NCBI Taxonomy database
Let's download the taxonomic information for the red panda.

### _Searching_ broadly using `Entrez.esearch()`
If all you have is the name of a species, this can be very ambiguous. For example, if you just look for "fox", how could NCBI know which fox you mean? But the red panda is unique (and so very pretty), so it should give a good result using **`Entrez.esearch()`**:

In [25]:
tax_handle = Entrez.esearch(db='taxonomy', term='red panda')
tax_records = Entrez.read(tax_handle)
tax_handle.close()

We have stored the results into a dictionary called `tax_records`. Let's print all of its keys and corresponding values:

In [26]:
for key, val in tax_records.items():
  print(key, val)

Count 1
RetMax 1
RetStart 0
IdList ['9649']
TranslationSet []
TranslationStack [{'Term': 'red panda[All Names]', 'Field': 'All Names', 'Count': '1', 'Explode': 'N'}, 'GROUP']
QueryTranslation red panda[All Names]


Nice, we found a single result, and we now have its id in the `IdList`. We can use that id to download the full taxonomic record of the red panda (assuming that the number `9649` is a correct result).

### _Fetching_ specific taxonomic records using IDs in `Entrez.efetch()`
Now that we have the ID of the red panda, we can use it to download the full desired record.

This time, since we know exactly what we're looking for, we use `efetch`, instead of `esearch` which we used above when we were looking more broadly.

Let's also add the ID of the giant panda, which I know to be `9646`.

In [27]:
tax_handle = Entrez.efetch(db='taxonomy', id=[9649, 9646])
tax_records = Entrez.read(tax_handle)
tax_records = list(tax_records) # Uncomment this to turn the iterator to a list


(remember that `Entrez` typically returns iterators, which you can only loop over once. Turn to list while you develop your code so that you don't have to go back to NCBI again and again, then remove the conversion — line 3 above – to leave as iterator in your final code)

In [28]:
for tax_rec in tax_records:
  for key, val in tax_rec.items():
    print(key, val, sep='\t:\t')
  print('\n')

TaxId	:	9649
ScientificName	:	Ailurus fulgens
OtherNames	:	{'Anamorph': [], 'Includes': [], 'EquivalentName': [], 'GenbankAnamorph': [], 'CommonName': ['red panda'], 'GenbankSynonym': [], 'Name': [{'ClassCDE': 'authority', 'DispName': 'Ailurus fulgens Cuvier, 1825'}], 'Misnomer': [], 'Misspelling': [], 'Synonym': [], 'Teleomorph': [], 'Acronym': [], 'Inpart': [], 'GenbankCommonName': 'lesser panda'}
ParentTaxId	:	9648
Rank	:	species
Division	:	Mammals
GeneticCode	:	{'GCId': '1', 'GCName': 'Standard'}
MitoGeneticCode	:	{'MGCId': '2', 'MGCName': 'Vertebrate Mitochondrial'}
Lineage	:	cellular organisms; Eukaryota; Opisthokonta; Metazoa; Eumetazoa; Bilateria; Deuterostomia; Chordata; Craniata; Vertebrata; Gnathostomata; Teleostomi; Euteleostomi; Sarcopterygii; Dipnotetrapodomorpha; Tetrapoda; Amniota; Mammalia; Theria; Eutheria; Boreoeutheria; Laurasiatheria; Carnivora; Caniformia; Ailuridae; Ailurus
LineageEx	:	[{'TaxId': '131567', 'ScientificName': 'cellular organisms', 'Rank': 'no rank'

Nice! There's a lot of very useful information there. We could for example extract the `Lineage` for each of these organisms very easily, and store it in a file that we can use later.

#### Homework
There is a rumour that red pandas are not related to giant pandas – that giant pandas belong to the larger bear family whereas red pandas are in their own group near the raccoons and other musteloids. Print out the `'family'` of the two animals to see what we get.

**Advanced real-world kind of homework:** Create a tab-separated file named `panda_taxonomies.tsv` that contains the following columns:
1. The TaxId.
1. The `CommonName`, if present, otherwise the GenbankCommonName.
1. The `Lineage`.
1. The following ranks of the `LineageEx`, if present, otherwise empty, each in its own column:
   1. `kingdom`.
   1. `phylum`.
   1. `order`.
   1. `family`.
   1. `genus`.