This repository has been archived by the owner. It is now read-only.
Scripts for updating annotations and reference sequences from UCSC
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.


Scripts for updating annotations and reference sequences from UCSC

The following scripts from gwips_tools can be used to update genome annotations and RefSeq's from UCSC.



MySQL server and MySQL-python package

On Ubuntu 14.04, you can install the following packages to get all the requirements - mysql-server, libmysqlclient-dev and python-mysqldb.


For both the scripts, the -h flag prints a help message.

To get the list of all available genomes from configuration:

sudo python -l

To update annotations for a genome for example, hg19:

sudo python -g hg19

Reload MySQL:

sudo service mysql reload

To update RefSeq's:

sudo python -g hg19

For convenience, a sample bash script is included which can be used to update the annotations followed by the reference sequences. Copy to and then modify the list of genomes to update and the path.

Why sudo?

Configuration - adding/updating available genomes

List of available genomes is done in the file config.json. Additional genomes can be added to the "genomes" section. Datasets to download can be specified in the "datasets" key of the respective genome.

Other variables

Source URL for downloading the mysql tables.
Downloaded MySQL tables are stored in this directory.
Source URL for RefSeq sequences.
Downloaded RefSeq's are stored in this directory.
User with write privileges to refseq_target_dir.
User with write privileges to target_dir.

Note regarding permissions

User/group id's for downloaded files are set in config.json

annotations_user = 'mysql'  # for mysql tables
refseq_user = 'vimal'  # for refseq sequences

Both the update scripts are run using sudo for the following reason. requires write access to /var/lib/mysql which is owned by mysql. When downloading datasets, the user and group id is changed to that of the mysql user (annotations_user) on the system. downloads sequences to directory specified under refseq_target_dir. These are owned by user specified in refseq_user.


Please copy tests/data/config.json.sample to tests/data/config.json Change paths for refseq_target_dir, target_dir and backup_dir.

Now run the test suite from the main source directory


Test configuration can be updated in gwips_tools/