Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Newer
Older
100755 106 lines (89 sloc) 2.724 kb
76e227a @wwood added partially tested version of script to mask a fasta where blast …
authored
1 #!/usr/bin/env ruby
2
3 require 'rubygems'
4 require 'bio'
e9e1749 @wwood blast_mask.rb now tested
authored
5 require 'fastercsv'
76e227a @wwood added partially tested version of script to mask a fasta where blast …
authored
6
7 # A script to mask out certain sections of a fasta file as
8 # specified by their homology to some blast database
9
10 class BlastHitArray < Array
11 # Taking a string sequence, apply the mask encoded in the Array superclass
12 # and return the masked string sequence
13 def masked_sequence(sequence)
14 seq = ''
15
16 (1..sequence.length).each do |i|
17 # run the gauntlet of masks, innocent until proven guilty
18 masked = false
19 each do |hit|
20 if hit.contains?(i)
21 masked = true
22 break
23 end
24 end
25
26 # print a mask character or not, depending on verdict
27 if masked
28 seq += 'X'
29 else
30 seq += sequence[i-1..i-1] #sequence is 0-based index, whereas hits are 1-based
31 end
32 end
33 return seq
34 end
35 end
36
37 class Hit
38 attr_accessor :from, :to
39 def initialize(from, to)
e9e1749 @wwood blast_mask.rb now tested
authored
40 raise Exception, "Integers needed" unless from.kind_of?(Integer) and to.kind_of?(Integer)
76e227a @wwood added partially tested version of script to mask a fasta where blast …
authored
41 @from = from
42 @to = to
43 end
44
45 def contains?(one_based_index)
8b4dfab @wwood blast_mask bugfix - previously incorrect when dealing with reverse hi…
authored
46 # If hit is a forward hit
47 if from < to
48 one_based_index >= from and one_based_index <= to
49 else #If hit is a reverse hit
50 one_based_index >= to and one_based_index <= from
51 end
76e227a @wwood added partially tested version of script to mask a fasta where blast …
authored
52 end
53 end
54
55 if __FILE__ == $0
6b9c192 @wwood blast_mask now has the capability to only print out sequences that ha…
authored
56 require 'optparse'
57
58 # Parse cmd line options
59 USAGE = "Usage: blast_mask.rb [-m] <fasta_filename> <blast_filename>"
60 options = {
61 :print_masked_sequences_only => false,
62 }
63 o = OptionParser.new do |opts|
64 opts.banner = USAGE
65
66 opts.on("-m", "--masked-only", "Print out only those sequences that have blast hits (or equivalently, only those that are masked") do |v|
67 options[:print_masked_sequences_only] = true
68 end
69 end
70 o.parse!
71
72 unless ARGV.length == 2
73a359d @wwood fixed typo in usage
authored
73 $stderr.puts USAGE
6b9c192 @wwood blast_mask now has the capability to only print out sequences that ha…
authored
74 exit 1
75 end
76
76e227a @wwood added partially tested version of script to mask a fasta where blast …
authored
77 fasta_filename = ARGV[0]
78 blast_filename = ARGV[1]
79
6b9c192 @wwood blast_mask now has the capability to only print out sequences that ha…
authored
80
81
76e227a @wwood added partially tested version of script to mask a fasta where blast …
authored
82 # read in blast data
83 blasts = {} #hash of sequence identifiers to arrays of blasthits
e9e1749 @wwood blast_mask.rb now tested
authored
84 FasterCSV.foreach(blast_filename, :col_sep => "\t") do |row|
76e227a @wwood added partially tested version of script to mask a fasta where blast …
authored
85 query = row[0]
e9e1749 @wwood blast_mask.rb now tested
authored
86 from = row[6].to_i
87 to = row[7].to_i
76e227a @wwood added partially tested version of script to mask a fasta where blast …
authored
88
89 blasts[query] ||= BlastHitArray.new
90 blasts[query].push Hit.new(from, to)
91 end
92
93 # read and print fasta file with the masking done
e9e1749 @wwood blast_mask.rb now tested
authored
94 Bio::FlatFile.foreach(Bio::FastaFormat, fasta_filename) do |entry|
95 # Apply masking if required
21944a1 @wwood blast_mask now handles more complex names of sequences
authored
96 if blasts[entry.definition]
97 masked = blasts[entry.definition].masked_sequence(entry.seq)
98 puts ">#{entry.definition}"
e9e1749 @wwood blast_mask.rb now tested
authored
99 puts masked
76e227a @wwood added partially tested version of script to mask a fasta where blast …
authored
100 else
6b9c192 @wwood blast_mask now has the capability to only print out sequences that ha…
authored
101 # No matching blast hits recorded. Print unless
102 # the config says not to print these ones
103 puts entry unless options[:print_masked_sequences_only]
76e227a @wwood added partially tested version of script to mask a fasta where blast …
authored
104 end
105 end
106 end
Something went wrong with that request. Please try again.