# MARC mappings
This notebook generates and allows you to view the mapping of MARC to the Argot data language used by TRLN Discovery.

First I need to bring in the mappings data from the Excel file that lives at https://github.com/trln/data-documentation/blob/master/argot/argot.xlsx 



In [735]:
require 'pp'
require 'simple_xlsx_reader'
doc = SimpleXlsxReader.open('../argot/argot.xlsx')
mappings_d = doc.sheets[1].data
mappings_h = doc.sheets[1].headers
fields_d = doc.sheets[0].data
fields_h = doc.sheets[0].headers
puts ''




Then, I throw out mappings that are not from standard MARC source data and take a look at the data.

In [737]:
marc_only = mappings_d.select{ |r| r[2] == "MARC"}
puts "Number of mappings: #{marc_only.size}"
marc_only.first(3).each { |m| puts m}
puts ''

Number of mappings: 669
["misc_id", "misc_id[type]", "MARC", "n", "GEN", "15", "2", "none", "map subelement to value", "type = \"National Bibliography Number\" if there is no $2; otherwise, map $2 using https://github.com/trln/marc-to-argot/blob/master/lib/translation_maps/shared/national_bibliography_codes.yaml", "Mapping is from: https://www.loc.gov/standards/sourcelist/national-bibliography.html", "misc_id[type]GEN152none", "0", "0", "n", "y", "y"]
["edition", "edition[label]", "MARC", "n", "GEN", "250", "3", "none", "concat_subelements", "x", ".", "edition[label]GEN2503none", "0", "0", "n", "n", "y"]
["imprint_main", "imprint_main[label]", "MARC", "n", "GEN", "260", "3", "If >1 260/264, chosen from latest (i1 = 3) or last", "subelement_to_value", "strip colon or other non-enclosing punctuation from end of subfield.", "x", "imprint_main[label]GEN2603If >1 260/264, chosen from latest (i1 = 3) or last", "0", "0", "n", "?", "https://github.com/trln/marc-to-argot/blob/TD-443-publication

There's a lot of info here we don't need, so I'm going to keep only what we need: 
* 0 -- parentfield
* 1 -- field
* 3 -- provisional?
* 4 -- institution
* 5 -- element/field
* 6 -- subelement/field(s)
* 7 -- constraints
* 8 -- processing_type
* 9 -- processing instructions
* 10 -- notes

In [738]:
less_data = marc_only.map { |m| [m[0], m[1], m[3], m[4], m[5], m[6], m[7], m[8], m[9], m[10]]}
less_data.first(3).each { |m| puts m}
puts ''

["misc_id", "misc_id[type]", "n", "GEN", "15", "2", "none", "map subelement to value", "type = \"National Bibliography Number\" if there is no $2; otherwise, map $2 using https://github.com/trln/marc-to-argot/blob/master/lib/translation_maps/shared/national_bibliography_codes.yaml", "Mapping is from: https://www.loc.gov/standards/sourcelist/national-bibliography.html"]
["edition", "edition[label]", "n", "GEN", "250", "3", "none", "concat_subelements", "x", "."]
["imprint_main", "imprint_main[label]", "n", "GEN", "260", "3", "If >1 260/264, chosen from latest (i1 = 3) or last", "subelement_to_value", "strip colon or other non-enclosing punctuation from end of subfield.", "x"]



Sort by MARC tag and print out the first 10

In [739]:
less_data.sort! {|a,b| a[4] <=> b[4]}
less_data.first(10).each { |m| puts m }
puts ''

["oclc_number", "oclc_number[value]", "n", "UNC", "1", "{whole field}", "complex contraint logic", "complex", "see notes", "https://github.com/trln/marc-to-argot/blob/master/spec/unc_oclcnum_spec.rb"]
["sersol_number", "sersol_number", "n", "UNC", "1", "{whole field}", "value =~ /^ss([ej]|[ie]b)\\d+$/", "subelement_to_value", "x", "x"]
["vendor_marc_id", "vendor_marc_id", "n", "UNC", "1", "{whole field}", "value does not meet criteria to be OCLC Number or Sersol Number", "subelement_to_value", "x", "x"]
["misc_id", "misc_id[value]", "n", "GEN", "10", "a", "none", "subelement_to_value", "misc_id[qual]   = \"\";;;misc_id[type] = \"LCCN\"", "x"]
["misc_id", "misc_id[value]", "n", "GEN", "10", "b", "none", "subelement_to_value", "misc_id[qual]  = \"\";;;misc_id[type] = \"NUCMC\"", "x"]
["misc_id", "misc_id[value]", "n", "GEN", "10", "z", "none", "subelement_to_value", "misc_id[qual]  = \"\";;;misc_id[type] = \"Canceled or invalid LCCN\"", "x"]
["names", "names[type]", "n", "GEN", "100", "{

Our MARC tags beginning with zeros need to be padded out, so let's fix that. 

In [740]:
fix_tags = less_data.each { |m| m[4] = m[4].rjust(3,'0') }
fix_tags.first(5).each { |m| puts m}
puts ''

["oclc_number", "oclc_number[value]", "n", "UNC", "001", "{whole field}", "complex contraint logic", "complex", "see notes", "https://github.com/trln/marc-to-argot/blob/master/spec/unc_oclcnum_spec.rb"]
["sersol_number", "sersol_number", "n", "UNC", "001", "{whole field}", "value =~ /^ss([ej]|[ie]b)\\d+$/", "subelement_to_value", "x", "x"]
["vendor_marc_id", "vendor_marc_id", "n", "UNC", "001", "{whole field}", "value does not meet criteria to be OCLC Number or Sersol Number", "subelement_to_value", "x", "x"]
["misc_id", "misc_id[value]", "n", "GEN", "010", "a", "none", "subelement_to_value", "misc_id[qual]   = \"\";;;misc_id[type] = \"LCCN\"", "x"]
["misc_id", "misc_id[value]", "n", "GEN", "010", "b", "none", "subelement_to_value", "misc_id[qual]  = \"\";;;misc_id[type] = \"NUCMC\"", "x"]



Change GEN to ALL in institution element

In [741]:
inst_fix = fix_tags.each { |m| m[3] = m[3].sub('GEN', 'ALL') }
inst_fix.first(5).each { |m| puts m}
puts ''

["oclc_number", "oclc_number[value]", "n", "UNC", "001", "{whole field}", "complex contraint logic", "complex", "see notes", "https://github.com/trln/marc-to-argot/blob/master/spec/unc_oclcnum_spec.rb"]
["sersol_number", "sersol_number", "n", "UNC", "001", "{whole field}", "value =~ /^ss([ej]|[ie]b)\\d+$/", "subelement_to_value", "x", "x"]
["vendor_marc_id", "vendor_marc_id", "n", "UNC", "001", "{whole field}", "value does not meet criteria to be OCLC Number or Sersol Number", "subelement_to_value", "x", "x"]
["misc_id", "misc_id[value]", "n", "ALL", "010", "a", "none", "subelement_to_value", "misc_id[qual]   = \"\";;;misc_id[type] = \"LCCN\"", "x"]
["misc_id", "misc_id[value]", "n", "ALL", "010", "b", "none", "subelement_to_value", "misc_id[qual]  = \"\";;;misc_id[type] = \"NUCMC\"", "x"]



Now we create a more complex data structure out of our mappings, so that we can report out on them in a structured way. The hash we create below will look like: 

<pre>
{MARC TAG => 
  {INSTITUTION => 
    {CONSTRAINT => 
      {ARGOT FIELD =>     
        {ARGOT ELEMENT => { :subfields => x,
                          :processing_type => x,
                          :processing instructions => x,
                          :notes => x,
                          :provisional => x }
     }
    }
   }
}
<pre>
First we set up the MARC TAGs for population...

In [742]:
map_hash = {}
fix_tags.each do |m|
  map_hash[m[4]] = {} unless map_hash.has_key?(m[4])
end
map_hash.first(5).each { |h| puts h }

["001", {}]
["010", {}]
["100", {}]
["110", {}]
["111", {}]


[["001", {}], ["010", {}], ["100", {}], ["110", {}], ["111", {}]]

Then we set up the INSTITUTION level structure for population...

In [743]:
fix_tags.each do |m|
  map_hash[m[4]][m[3]] = {} unless map_hash[m[4]].has_key?(m[3])
end
map_hash.first(5).each { |h| puts h }

["001", {"UNC"=>{}}]
["010", {"ALL"=>{}}]
["100", {"ALL"=>{}}]
["110", {"ALL"=>{}}]
["111", {"ALL"=>{}}]


[["001", {"UNC"=>{}}], ["010", {"ALL"=>{}}], ["100", {"ALL"=>{}}], ["110", {"ALL"=>{}}], ["111", {"ALL"=>{}}]]

Then we set up the CONSTRAINT level structure for population...

In [744]:
fix_tags.each do |m|
  map_hash[m[4]][m[3]][m[6]] = {} unless map_hash[m[4]][m[3]].has_key?(m[6])
end
map_hash.first(5).each { |h| puts h }

["001", {"UNC"=>{"complex contraint logic"=>{}, "value =~ /^ss([ej]|[ie]b)\\d+$/"=>{}, "value does not meet criteria to be OCLC Number or Sersol Number"=>{}}}]
["010", {"ALL"=>{"none"=>{}}}]
["100", {"ALL"=>{"none"=>{}, "$t present"=>{}, "100$t present OR 240 field present"=>{}}}]
["110", {"ALL"=>{"$t present"=>{}, "none"=>{}, "110$t present OR 240 field present"=>{}}}]
["111", {"ALL"=>{"$t present"=>{}, "none"=>{}, "111$t present OR 240 field present"=>{}}}]


[["001", {"UNC"=>{"complex contraint logic"=>{}, "value =~ /^ss([ej]|[ie]b)\\d+$/"=>{}, "value does not meet criteria to be OCLC Number or Sersol Number"=>{}}}], ["010", {"ALL"=>{"none"=>{}}}], ["100", {"ALL"=>{"none"=>{}, "$t present"=>{}, "100$t present OR 240 field present"=>{}}}], ["110", {"ALL"=>{"$t present"=>{}, "none"=>{}, "110$t present OR 240 field present"=>{}}}], ["111", {"ALL"=>{"$t present"=>{}, "none"=>{}, "111$t present OR 240 field present"=>{}}}]]

Then we set up the ARGOT FIELD level for population

In [745]:
fix_tags.each do |m|
  map_hash[m[4]][m[3]][m[6]][m[0]] = {} unless map_hash[m[4]][m[3]][m[6]].has_key?(m[0])
end
map_hash.first(5).each { |h| puts h }

["001", {"UNC"=>{"complex contraint logic"=>{"oclc_number"=>{}}, "value =~ /^ss([ej]|[ie]b)\\d+$/"=>{"sersol_number"=>{}}, "value does not meet criteria to be OCLC Number or Sersol Number"=>{"vendor_marc_id"=>{}}}}]
["010", {"ALL"=>{"none"=>{"misc_id"=>{}}}}]
["100", {"ALL"=>{"none"=>{"names"=>{}}, "$t present"=>{nil=>{}, "this_work"=>{}}, "100$t present OR 240 field present"=>{"this_work"=>{}}}}]
["110", {"ALL"=>{"$t present"=>{"this_work"=>{}, nil=>{}}, "none"=>{"names"=>{}}, "110$t present OR 240 field present"=>{"this_work"=>{}}}}]
["111", {"ALL"=>{"$t present"=>{"this_work"=>{}, nil=>{}}, "none"=>{"names"=>{}}, "111$t present OR 240 field present"=>{"this_work"=>{}}}}]


[["001", {"UNC"=>{"complex contraint logic"=>{"oclc_number"=>{}}, "value =~ /^ss([ej]|[ie]b)\\d+$/"=>{"sersol_number"=>{}}, "value does not meet criteria to be OCLC Number or Sersol Number"=>{"vendor_marc_id"=>{}}}}], ["010", {"ALL"=>{"none"=>{"misc_id"=>{}}}}], ["100", {"ALL"=>{"none"=>{"names"=>{}}, "$t present"=>{nil=>{}, "this_work"=>{}}, "100$t present OR 240 field present"=>{"this_work"=>{}}}}], ["110", {"ALL"=>{"$t present"=>{"this_work"=>{}, nil=>{}}, "none"=>{"names"=>{}}, "110$t present OR 240 field present"=>{"this_work"=>{}}}}], ["111", {"ALL"=>{"$t present"=>{"this_work"=>{}, nil=>{}}, "none"=>{"names"=>{}}, "111$t present OR 240 field present"=>{"this_work"=>{}}}}]]

Then we set up the ARGOT ELEMENT level structure for population...

In [746]:
fix_tags.each do |m|
  map_hash[m[4]][m[3]][m[6]][m[0]][m[1]] = [] unless map_hash[m[4]][m[3]][m[6]][m[0]].has_key?(m[1])
end
map_hash.first(5).each { |h| puts h }

["001", {"UNC"=>{"complex contraint logic"=>{"oclc_number"=>{"oclc_number[value]"=>[]}}, "value =~ /^ss([ej]|[ie]b)\\d+$/"=>{"sersol_number"=>{"sersol_number"=>[]}}, "value does not meet criteria to be OCLC Number or Sersol Number"=>{"vendor_marc_id"=>{"vendor_marc_id"=>[]}}}}]
["010", {"ALL"=>{"none"=>{"misc_id"=>{"misc_id[value]"=>[]}}}}]
["100", {"ALL"=>{"none"=>{"names"=>{"names[type]"=>[], "names[name]"=>[], "names[rel]"=>[]}}, "$t present"=>{nil=>{"this_work[type]"=>[]}, "this_work"=>{"this_work[title]"=>[]}}, "100$t present OR 240 field present"=>{"this_work"=>{"this_work[author]"=>[]}}}}]
["110", {"ALL"=>{"$t present"=>{"this_work"=>{"this_work[title]"=>[]}, nil=>{"this_work[type]"=>[]}}, "none"=>{"names"=>{"names[type]"=>[], "names[name]"=>[], "names[rel]"=>[]}}, "110$t present OR 240 field present"=>{"this_work"=>{"this_work[author]"=>[]}}}}]
["111", {"ALL"=>{"$t present"=>{"this_work"=>{"this_work[title]"=>[]}, nil=>{"this_work[type]"=>[]}}, "none"=>{"names"=>{"names[type]"=

[["001", {"UNC"=>{"complex contraint logic"=>{"oclc_number"=>{"oclc_number[value]"=>[]}}, "value =~ /^ss([ej]|[ie]b)\\d+$/"=>{"sersol_number"=>{"sersol_number"=>[]}}, "value does not meet criteria to be OCLC Number or Sersol Number"=>{"vendor_marc_id"=>{"vendor_marc_id"=>[]}}}}], ["010", {"ALL"=>{"none"=>{"misc_id"=>{"misc_id[value]"=>[]}}}}], ["100", {"ALL"=>{"none"=>{"names"=>{"names[type]"=>[], "names[name]"=>[], "names[rel]"=>[]}}, "$t present"=>{nil=>{"this_work[type]"=>[]}, "this_work"=>{"this_work[title]"=>[]}}, "100$t present OR 240 field present"=>{"this_work"=>{"this_work[author]"=>[]}}}}], ["110", {"ALL"=>{"$t present"=>{"this_work"=>{"this_work[title]"=>[]}, nil=>{"this_work[type]"=>[]}}, "none"=>{"names"=>{"names[type]"=>[], "names[name]"=>[], "names[rel]"=>[]}}, "110$t present OR 240 field present"=>{"this_work"=>{"this_work[author]"=>[]}}}}], ["111", {"ALL"=>{"$t present"=>{"this_work"=>{"this_work[title]"=>[]}, nil=>{"this_work[type]"=>[]}}, "none"=>{"names"=>{"names[ty

Then we get the Argot field data we want to merge in...

In [747]:
fields_h.each { |h| puts "#{fields_h.index(h)} -- #{h}"}
puts ''

0 -- argot_field
1 -- has parent
2 -- is parent?
3 -- vernacular treatment
4 -- vernacular status
5 -- type
6 -- argot-marc processor/pattern
7 -- category
8 -- provisional
9 -- responsibility
10 -- obligation
11 -- searchable in
12 -- retain order
13 -- facet
14 -- Brief display
15 -- Full display
16 -- note on display
17 -- definition
18 -- rationale
19 -- relevance importance (1=most imp)
20 -- endeca equivalent
21 -- notes
22 -- implementation details
23 -- documentation
24 -- JIRA issue
25 -- issue ct
26 -- mapping ct
27 -- done in mta?
28 -- tests?
29 -- in schema?



Keep only the columns we need...

In [748]:
field_hash = {}
fields_d.each do |f|
field_hash[f[0]] = {:indexes => f[11],
                    :facet => f[13],
                    :disp_b => f[14],
                    :disp_f => f[15],
                    :disp_note => f[16],
                    :doc => f[23]}  
end
puts ''




Then we populate the hash with the details...

In [750]:
fix_tags.each do |m|
  this_mapping = { :subfields => m[5],
                   :processing_type => m[7],
                   :processing_inst => m[8],
                   :notes => m[9],
                   :provisional => m[2]}
  map_hash[m[4]][m[3]][m[6]][m[0]][m[1]] << this_mapping
end
map_hash.first(5).each { |h| puts pp(h) }
puts ''

["001",
 {"UNC"=>
   {"complex contraint logic"=>
     {"oclc_number"=>
       {"oclc_number[value]"=>
         [{:subfields=>"{whole field}",
           :processing_type=>"complex",
           :processing_inst=>"see notes",
           :notes=>
            "https://github.com/trln/marc-to-argot/blob/master/spec/unc_oclcnum_spec.rb",
           :provisional=>"n"}]}},
    "value =~ /^ss([ej]|[ie]b)\\d+$/"=>
     {"sersol_number"=>
       {"sersol_number"=>
         [{:subfields=>"{whole field}",
           :processing_type=>"subelement_to_value",
           :processing_inst=>"x",
           :notes=>"x",
           :provisional=>"n"}]}},
    "value does not meet criteria to be OCLC Number or Sersol Number"=>
     {"vendor_marc_id"=>
       {"vendor_marc_id"=>
         [{:subfields=>"{whole field}",
           :processing_type=>"subelement_to_value",
           :processing_inst=>"x",
           :notes=>"x",
           :provisional=>"n"}]}}}}]
["001", {"UNC"=>{"complex contraint logic"=>{"o

           :provisional=>"n"}]}},
    "110$t present OR 240 field present"=>
     {"this_work"=>
       {"this_work[author]"=>
         [{:subfields=>"abc(d)(g)(n)u",
           :processing_type=>"concat_subelements",
           :processing_inst=>"See linked documentation",
           :notes=>
            "https://github.com/trln/data-documentation/blob/master/argot/spec_docs/this_work.adoc",
           :provisional=>"n"}]}}}}]
["110", {"ALL"=>{"$t present"=>{"this_work"=>{"this_work[title]"=>[{:subfields=>"(d)f(g)kl(n)pt", :processing_type=>"array_from_subelements", :processing_inst=>"See linked documentation", :notes=>"https://github.com/trln/data-documentation/blob/master/argot/spec_docs/this_work.adoc", :provisional=>"n"}]}, nil=>{"this_work[type]"=>[{:subfields=>"{na}", :processing_type=>"constant", :processing_inst=>"\"type\":\"this\"", :notes=>"https://github.com/trln/data-documentation/blob/master/argot/spec_docs/this_work.adoc", :provisional=>"n"}]}}, "none"=>{"names"=>{"names

From this, we generate a human-readable report in HTML. Here, we create the HTML file and all the parts of it that come before the dump of data.

In [751]:
hfile = File.new("marc_mappings.html", "w")
hfile.puts('<HTML>')
hfile.puts('<HEAD>')
hfile.puts('<title>Mappings from MARC to Argot</title>')
hfile.puts("<style media='all' type='text/css'>")
hfile.puts("body {font-family: Helvetica Neue, sans-serif;}")
hfile.puts("h2 {text-indent: 1em;}")
hfile.puts("h3 {text-indent: 2em;}")
hfile.puts("h4 {text-indent: 3em;}")
hfile.puts(".mapping {border: 1px dotted gray; margin-left: 4em; margin-bottom: 1em; padding: 0.5em;}")
hfile.puts(".provisional {font-variant: small-caps; color: red; justify-content: center;}")
hfile.puts("</style>")
hfile.puts('</HEAD>')
hfile.puts('<BODY>')
hfile.puts('<h1>Mappings from MARC to Argot</h1>')


Then we do a bunch of mess to translate the hash to HTML...

In [752]:
map_hash.each do |tag, insthash|
  hfile.puts("<h2>#{tag}</h2>")
  insthash.each do |inst, constrainthash|
    hfile.puts("<h3>Mappings for #{inst}</h3>")
    constrainthash.each do |constraint, afieldhash|
      if constraint == 'none'
        hfile.puts('<h4>IN ALL CASES...</h4>')
      else
        hfile.puts("<h4>WHEN #{constraint} THEN...</h4>")
      end
      afieldhash.each do |afield, aelementhash|
        doc = field_hash[afield][:doc]
        doclink = doc if doc.start_with?('http')
        docneeded = 'y' if doc.start_with?('needed')
        
        
        aelementhash.each do |aelement, mappings|
          search_in = field_hash[aelement][:indexes] unless field_hash[aelement][:indexes] == 'not indexed'
          facet = field_hash[aelement][:facet] unless field_hash[aelement][:facet] == 'x'
          disp_b = "Displayed in brief record: #{field_hash[aelement][:disp_b]}" unless field_hash[aelement][:disp_b] == 'x'
          disp_f = "Displayed in full record: #{field_hash[aelement][:disp_f]}" unless field_hash[aelement][:disp_f] == 'x'
          disp_n = "Notes on display: #{field_hash[aelement][:disp_note]}" unless field_hash[aelement][:disp_note] == 'x'
          mappings.each do |m|
            hfile.puts('<div class="mapping">')
            if m[:provisional] == 'y'
              hfile.puts('<div class="provisional">Provisional mapping</div>')
            end
            if m[:processing_type] == 'constant'
              hfile.puts("<b>Constant value: #{aelement} = #{m[:processing_inst].split(':')[1].gsub('"', '')}</b>")
            elsif m[:processing_type] == 'DO NOT SET'
              hfile.puts("<b>Do not set #{aelement} value</b>")
            else
              hfile.puts("<b>#{m[:subfields]} => #{aelement}</b>")
            end
            
            case m[:processing_type]
              when 'array_from_subelements'
               xform_type = 'Contents of each subfield becomes separate element in array. <i>Example: $a cat $b dog $c fish => ["cat", "dog", "fish"]</i>'
              when 'concat_subelements'
               xform_type = 'Contents of subfields joined together into one string, separated by space. <i>Example: $a cat $b dog $c fish => "cat dog fish"</i>'
              when 'map subelement to value'
               xform_type = 'Subfield value is a code, which is translated into a human readable value and becomes element in array. <i>Example: $a eng $b ger=> ["English", "German"]</i>'
              when 'map indicator value'
               xform_type = 'Indicator value is translated into human readable value. <i>Example: 246 i2=4 => "Cover title"</i>'
            end
            
            unless m[:processing_type] == 'constant'
              hfile.puts("<br />Processing method: #{xform_type}") if xform_type
              hfile.puts("<br />Special mapping instructions: #{m[:processing_inst]}") unless m[:processing_inst] =~ /^(See linked|x)/
            end
            
            hfile.puts("<br />&nbsp;<br />")
            
            if search_in
              hfile.puts("Searchable as: #{search_in.split(';;;').join(', ')}")
            else
              hfile.puts("<br />Not searchable")
            end
            hfile.puts("<br />#{disp_b}") if disp_b
            hfile.puts("<br />#{disp_f}") if disp_f
            hfile.puts("<br />#{disp_n}") if disp_n
            
            hfile.puts("<br />&nbsp;<br />")
            
            if doclink
              hfile.puts("For details, see <a href=\"#{doclink}\">documentation on Argot field: #{afield}</a>")
            elsif docneeded == 'y'
              hfile.puts("More details forthcoming in documentation to be written for Argot field: #{afield}")
            end
            
            hfile.puts('</div>')
          end
        end
      end
    end
  end
end

NoMethodError: undefined method `[]' for nil:NilClass

In [753]:
hfile.puts('</BODY>')
hfile.puts('</HTML>')
hfile.close