Introduction {#chapter:introduction}
============

What is Biopython?
------------------

The Biopython Project is an international association of developers of
freely available Python (<http://www.python.org>) tools for
computational molecular biology. Python is an object oriented,
interpreted, flexible language that is becoming increasingly popular for
scientific computing. Python is easy to learn, has a very clear syntax
and can easily be extended with modules written in C, C++ or FORTRAN.

The Biopython web site (<http://www.biopython.org>) provides an online
resource for modules, scripts, and web links for developers of
Python-based software for bioinformatics use and research. Basically,
the goal of Biopython is to make it as easy as possible to use Python
for bioinformatics by creating high-quality, reusable modules and
classes. Biopython features include parsers for various Bioinformatics
file formats (BLAST, Clustalw, FASTA, Genbank,...), access to online
services (NCBI, Expasy,...), interfaces to common and not-so-common
programs (Clustalw, DSSP, MSMS...), a standard sequence class, various
clustering modules, a KD tree data structure etc. and even
documentation.

Basically, we just like to program in Python and want to make it as easy
as possible to use Python for bioinformatics by creating high-quality,
reusable modules and scripts.

What can I find in the Biopython package
----------------------------------------

The main Biopython releases have lots of functionality, including:

-   The ability to parse bioinformatics files into Python utilizable
    data structures, including support for the following formats:

    -   Blast output – both from standalone and WWW Blast

    -   Clustalw

    -   FASTA

    -   GenBank

    -   PubMed and Medline

    -   ExPASy files, like Enzyme and Prosite

    -   SCOP, including ‘dom’ and ‘lin’ files

    -   UniGene

    -   SwissProt

-   Files in the supported formats can be iterated over record by record
    or indexed and accessed via a Dictionary interface.

-   Code to deal with popular on-line bioinformatics destinations such
    as:

    -   NCBI – Blast, Entrez and PubMed services

    -   ExPASy – Swiss-Prot and Prosite entries, as well as Prosite
        searches

-   Interfaces to common bioinformatics programs such as:

    -   Standalone Blast from NCBI

    -   Clustalw alignment program

    -   EMBOSS command line tools

-   A standard sequence class that deals with sequences, ids on
    sequences, and sequence features.

-   Tools for performing common operations on sequences, such as
    translation, transcription and weight calculations.

-   Code to perform classification of data using k Nearest Neighbors,
    Naive Bayes or Support Vector Machines.

-   Code for dealing with alignments, including a standard way to create
    and deal with substitution matrices.

-   Code making it easy to split up parallelizable tasks into
    separate processes.

-   GUI-based programs to do basic sequence manipulations, translations,
    BLASTing, etc.

-   Extensive documentation and help with using the modules, including
    this file, on-line wiki documentation, the web site, and the
    mailing list.

-   Integration with BioSQL, a sequence database schema also supported
    by the BioPerl and BioJava projects.

We hope this gives you plenty of reasons to download and start using
Biopython!

Installing Biopython
--------------------

All of the installation information for Biopython was separated from
this document to make it easier to keep updated.

The short version is go to our downloads page
(<http://biopython.org/wiki/Download>), download and install the listed
dependencies, then download and install Biopython. Biopython runs on
many platforms (Windows, Mac, and on the various flavors of Linux and
Unix). For Windows we provide pre-compiled click-and-run installers,
while for Unix and other operating systems you must install from source
as described in the included README file. This is usually as simple as
the standard commands:




(You can in fact skip the build and test, and go straight to the install
– but its better to make sure everything seems to be working.)

The longer version of our installation instructions covers installation
of Python, Biopython dependencies and Biopython itself. It is available
in PDF (<http://biopython.org/DIST/docs/install/Installation.pdf>) and
HTML formats
(<http://biopython.org/DIST/docs/install/Installation.html>).

Frequently Asked Questions (FAQ)
--------------------------------

1.  *How do I cite Biopython in a scientific publication?*\
    Please cite our application note @cock2009 [Cock *et al.*, 2009] as
    the main Biopython reference. In addition, please cite any
    publications from the following list if appropriate, in particular
    as a reference for specific modules within Biopython (more
    information can be found on our website):

    -   For the official project announcement: @chapman2000 [Chapman and
        Chang, 2000];

    -   For `Bio.PDB`: @hamelryck2003a [Hamelryck and Manderick, 2003];

    -   For `Bio.Cluster`: @dehoon2004 [De Hoon *et al.*, 2004];

    -   For `Bio.Graphics.GenomeDiagram`: @pritchard2006 [Pritchard
        *et al.*, 2006];

    -   For `Bio.Phylo` and `Bio.Phylo.PAML`: @talevich2012 [Talevich
        *et al.*, 2012];

    -   For the FASTQ file format as supported in Biopython, BioPerl,
        BioRuby, BioJava, and EMBOSS: @cock2010 [Cock *et al.*, 2010].

2.  *How should I capitalize “Biopython”? Is “BioPython” OK?*\
    The correct capitalization is “Biopython”, not “BioPython” (even
    though that would have matched BioPerl, BioJava and BioRuby).

3.  *What is going wrong with my print commands?*\
    This tutorial now uses the Python 3 style print *function*. As of
    Biopython 1.62, we support both Python 2 and Python 3. The most
    obvious language difference is the print *statement* in Python 2
    became a print *function* in Python 3.

    For example, this will only work under Python 2:



In [None]:
print "Hello World!"



    If you try that on Python 3 you’ll get a `SyntaxError`. Under Python
    3 you must write:



In [None]:
print("Hello World!")



    Surprisingly that will also work on Python 2 – but only for simple
    examples printing one thing. In general you need to add this magic
    line to the start of your Python scripts to use the print function
    under Python 2.6 and 2.7:




    If you forget to add this magic import, under Python 2 you’ll see
    extra brackets produced by trying to use the print function when
    Python 2 is interpreting it as a print statement and a tuple.

4.  *How do I find out what version of Biopython I have installed?*\
    Use this:



In [None]:
import Bio
print(Bio.__version__)
