Reads metadata from Integrated Microbial Genomes (IMG) metadata files. Metadata files are generated by searching for one or more taxons, and then exporting various/all genome-specific characters e.g. kingdom, genus, temperature range, taxon identifier, etc.


gem install bio-img_metadata


require 'bio-img_metadata'

d = DATA_DIR, 'head.metadata.csv') #=> an Array of Bio::IMG::Metadata objects

d.length.should == 9 #=> The array has 9 members, one for each line in the metadata file
d[0].kind_of?(Bio::IMG::Lineage).should == true #=> Each lineage's object

d[0].domain.should == 'Archaea' #=> some attributes are now methods (mostly the taxonomy-related ones)
d[1].taxon_id.should == 2515075008

d[0].attributes['Status'].should == 'Finished' #=> the rest are in the attributes array

How to get the metadata file

Go to IMG > Genome Browser:

In the Table Configuration section:

  • Genome Field > Click All
  • Project Metadata > Click All
  • Data Statistics > Click All

Click Display Genomes Again. In the Genome Browser section > Click Select All. Finally, click the Export button.

PS/ Don't trust the IMG metadata too much. There are some big mistakes, e.g. in the 16S copy number

PS2/ What have I done to create the FIXED metadata?

  • I have deleted two occurences of "\r" (^M) by ""
  • taxonoid 2515154013 has two extra fields: remove the two cells containing "Human wound, cranian"
  • Replace cells containing "-1" by ""
  • Replaced 'Marine archaeal group 1 BG20 (Nitrosoarchaeum limnia BG20)' by 'Nitrosoarchaeum limnia BG20'

(Download instructions kindly contributed by @fangly / Florent Angly)

