# MARC mappings
This notebook generates and allows you to view the mapping of MARC to the Argot data language used by TRLN Discovery.

First I need to bring in the mappings data from the Excel file that lives at https://github.com/trln/data-documentation/blob/master/argot/argot.xlsx 



In [214]:
require 'pp'
require 'simple_xlsx_reader'
doc = SimpleXlsxReader.open('../argot/argot.xlsx')
mappings_d = doc.sheets[1].data
mappings_h = doc.sheets[1].headers
puts ''





Then I want to see my headers along with the column number for each:

In [215]:
mappings_h.each { |h| puts "#{mappings_h.index(h)} -- #{h}"}
puts ''

0 -- parentfield
1 -- field
2 -- source data format
3 -- provisional?
4 -- institution
5 -- element/field
6 -- subelement/field(s)
7 -- constraints
8 -- processing_type
9 -- processing instructions
10 -- notes
11 -- mapping_id
12 -- mapping issue ct
13 -- field issue ct
14 -- field defined?
15 -- done in mta?
16 -- tests done?



Then, I throw out mappings that are not from standard MARC source data and take a look at the data.

In [216]:
marc_only = mappings_d.select{ |r| r[2] == "MARC"}
puts "Number of mappings: #{marc_only.size}"
marc_only.first(3).each { |m| puts m}
puts ''

Number of mappings: 665
["included_work", "included_work[title]", "MARC", "n", "GEN", "700", "f(g)hklmnoprst", "i2=2 AND ($t OR $k)", "array_from_subelements", "x", "https://github.com/trln/data-documentation/blob/master/argot/spec_docs/included_work.adoc", "included_work[title]GEN700f(g)hklmnoprsti2=2 AND ($t OR $k)", "0", "0", "n", "n", "y"]
["included_work", "included_work[title]", "MARC", "n", "GEN", "710", "(d)f(g)hklm(n)oprst", "i2=2 AND ($t OR $k)", "array_from_subelements", "$d, $n, $g included in [title] if they follow $t or $k", "https://github.com/trln/data-documentation/blob/master/argot/spec_docs/included_work.adoc", "included_work[title]GEN710(d)f(g)hklm(n)oprsti2=2 AND ($t OR $k)", "0", "0", "n", "n", "y"]
["included_work", "included_work[title]", "MARC", "n", "GEN", "711", "(d)f(g)hklm(n)pst", "i2=2 AND ($t OR $k)", "array_from_subelements", "$d, $g, $n included if they follow $t or $k", "https://github.com/trln/data-documentation/blob/master/argot/spec_docs/included_wo

There's a lot of info here we don't need, so I'm going to keep only what we need: 
* 0 -- parentfield
* 1 -- field
* 3 -- provisional?
* 4 -- institution
* 5 -- element/field
* 6 -- subelement/field(s)
* 7 -- constraints
* 8 -- processing_type
* 9 -- processing instructions
* 10 -- notes

In [217]:
less_data = marc_only.map { |m| [m[0], m[1], m[3], m[4], m[5], m[6], m[7], m[8], m[9], m[10]]}
less_data.first(3).each { |m| puts m}
puts ''

["included_work", "included_work[title]", "n", "GEN", "700", "f(g)hklmnoprst", "i2=2 AND ($t OR $k)", "array_from_subelements", "x", "https://github.com/trln/data-documentation/blob/master/argot/spec_docs/included_work.adoc"]
["included_work", "included_work[title]", "n", "GEN", "710", "(d)f(g)hklm(n)oprst", "i2=2 AND ($t OR $k)", "array_from_subelements", "$d, $n, $g included in [title] if they follow $t or $k", "https://github.com/trln/data-documentation/blob/master/argot/spec_docs/included_work.adoc"]
["included_work", "included_work[title]", "n", "GEN", "711", "(d)f(g)hklm(n)pst", "i2=2 AND ($t OR $k)", "array_from_subelements", "$d, $g, $n included if they follow $t or $k", "https://github.com/trln/data-documentation/blob/master/argot/spec_docs/included_work.adoc"]



Sort by MARC tag and print out the first 10

In [218]:
less_data.sort! {|a,b| a[4] <=> b[4]}
less_data.first(10).each { |m| puts m }
puts ''

["oclc_number", "oclc_number[value]", "n", "UNC", "1", "{whole field}", "complex contraint logic", "complex", "see notes", "https://github.com/trln/marc-to-argot/blob/master/spec/unc_oclcnum_spec.rb"]
["sersol_number", "sersol_number", "n", "UNC", "1", "{na}", "value =~ /^ss([ej]|[ie]b)\\d+$/", "subelement_to_value", "x", "x"]
["vendor_marc_id", "vendor_marc_id", "n", "UNC", "1", "{na}", "value does not meet criteria to be OCLC Number or Sersol Number", "subelement_to_value", "x", "x"]
["misc_id", "misc_id[value]", "n", "GEN", "10", "a", "none", "subelement_to_value", "misc_id[qual]   = \"\";;;misc_id[type] = \"LCCN\"", "x"]
["misc_id", "misc_id[value]", "n", "GEN", "10", "b", "none", "subelement_to_value", "misc_id[qual]  = \"\";;;misc_id[type] = \"NUCMC\"", "x"]
["misc_id", "misc_id[value]", "n", "GEN", "10", "z", "none", "subelement_to_value", "misc_id[qual]  = \"\";;;misc_id[type] = \"Canceled or invalid LCCN\"", "x"]
["this_work", "this_work[title]", "n", "GEN", "100", "f(g)hklnpt

Our MARC tags beginning with zeros need to be padded out, so let's fix that. 

In [219]:
fix_tags = less_data.each { |m| m[4] = m[4].rjust(3,'0') }
fix_tags.first(5).each { |m| puts m}
puts ''

["oclc_number", "oclc_number[value]", "n", "UNC", "001", "{whole field}", "complex contraint logic", "complex", "see notes", "https://github.com/trln/marc-to-argot/blob/master/spec/unc_oclcnum_spec.rb"]
["sersol_number", "sersol_number", "n", "UNC", "001", "{na}", "value =~ /^ss([ej]|[ie]b)\\d+$/", "subelement_to_value", "x", "x"]
["vendor_marc_id", "vendor_marc_id", "n", "UNC", "001", "{na}", "value does not meet criteria to be OCLC Number or Sersol Number", "subelement_to_value", "x", "x"]
["misc_id", "misc_id[value]", "n", "GEN", "010", "a", "none", "subelement_to_value", "misc_id[qual]   = \"\";;;misc_id[type] = \"LCCN\"", "x"]
["misc_id", "misc_id[value]", "n", "GEN", "010", "b", "none", "subelement_to_value", "misc_id[qual]  = \"\";;;misc_id[type] = \"NUCMC\"", "x"]



Change GEN to ALL in institution element

In [220]:
inst_fix = fix_tags.each { |m| m[3] = m[3].sub('GEN', 'ALL') }
inst_fix.first(5).each { |m| puts m}
puts ''

["oclc_number", "oclc_number[value]", "n", "UNC", "001", "{whole field}", "complex contraint logic", "complex", "see notes", "https://github.com/trln/marc-to-argot/blob/master/spec/unc_oclcnum_spec.rb"]
["sersol_number", "sersol_number", "n", "UNC", "001", "{na}", "value =~ /^ss([ej]|[ie]b)\\d+$/", "subelement_to_value", "x", "x"]
["vendor_marc_id", "vendor_marc_id", "n", "UNC", "001", "{na}", "value does not meet criteria to be OCLC Number or Sersol Number", "subelement_to_value", "x", "x"]
["misc_id", "misc_id[value]", "n", "ALL", "010", "a", "none", "subelement_to_value", "misc_id[qual]   = \"\";;;misc_id[type] = \"LCCN\"", "x"]
["misc_id", "misc_id[value]", "n", "ALL", "010", "b", "none", "subelement_to_value", "misc_id[qual]  = \"\";;;misc_id[type] = \"NUCMC\"", "x"]



Now we create a more complex data structure out of our mappings, so that we can report out on them in a structured way. The hash we create below will look like: 

<pre>
{MARC TAG => 
  {INSTITUTION => 
    {ARGOT FIELD => 
      {CONSTRAINT => 
        {ARGOT ELEMENT => { :subfields => x,
                          :processing_type => x,
                          :processing instructions => x,
                          :notes => x,
                          :provisional => x }
     }
    }
   }
}
<pre>
First we set up the MARC TAGs for population...

In [221]:
map_hash = {}
fix_tags.each do |m|
  map_hash[m[4]] = {} unless map_hash.has_key?(m[4])
end
map_hash.first(5).each { |h| puts h }

["001", {}]
["010", {}]
["100", {}]
["110", {}]
["111", {}]


[["001", {}], ["010", {}], ["100", {}], ["110", {}], ["111", {}]]

Then we set up the INSTITUTION level structure for population...

In [222]:
fix_tags.each do |m|
  map_hash[m[4]][m[3]] = {} unless map_hash[m[4]].has_key?(m[3])
end
map_hash.first(5).each { |h| puts h }

["001", {"UNC"=>{}}]
["010", {"ALL"=>{}}]
["100", {"ALL"=>{}}]
["110", {"ALL"=>{}}]
["111", {"ALL"=>{}}]


[["001", {"UNC"=>{}}], ["010", {"ALL"=>{}}], ["100", {"ALL"=>{}}], ["110", {"ALL"=>{}}], ["111", {"ALL"=>{}}]]

Then we set up the ARGOT FIELD level structure for population...

In [223]:
fix_tags.each do |m|
  map_hash[m[4]][m[3]][m[0]] = {} unless map_hash[m[4]][m[3]].has_key?(m[0])
end
map_hash.first(5).each { |h| puts h }

["001", {"UNC"=>{"oclc_number"=>{}, "sersol_number"=>{}, "vendor_marc_id"=>{}}}]
["010", {"ALL"=>{"misc_id"=>{}}}]
["100", {"ALL"=>{"this_work"=>{}, "names"=>{}}}]
["110", {"ALL"=>{"this_work"=>{}, "names"=>{}}}]
["111", {"ALL"=>{"this_work"=>{}, "names"=>{}}}]


[["001", {"UNC"=>{"oclc_number"=>{}, "sersol_number"=>{}, "vendor_marc_id"=>{}}}], ["010", {"ALL"=>{"misc_id"=>{}}}], ["100", {"ALL"=>{"this_work"=>{}, "names"=>{}}}], ["110", {"ALL"=>{"this_work"=>{}, "names"=>{}}}], ["111", {"ALL"=>{"this_work"=>{}, "names"=>{}}}]]

Then we set up the CONSTRAINT level for population

In [224]:
fix_tags.each do |m|
  map_hash[m[4]][m[3]][m[0]][m[6]] = {} unless map_hash[m[4]][m[3]][m[0]].has_key?(m[6])
end
map_hash.first(5).each { |h| puts h }

["001", {"UNC"=>{"oclc_number"=>{"complex contraint logic"=>{}}, "sersol_number"=>{"value =~ /^ss([ej]|[ie]b)\\d+$/"=>{}}, "vendor_marc_id"=>{"value does not meet criteria to be OCLC Number or Sersol Number"=>{}}}}]
["010", {"ALL"=>{"misc_id"=>{"none"=>{}}}}]
["100", {"ALL"=>{"this_work"=>{"100$t"=>{}, "none"=>{}}, "names"=>{"none"=>{}}}}]
["110", {"ALL"=>{"this_work"=>{"110$t"=>{}, "none"=>{}}, "names"=>{"none"=>{}}}}]
["111", {"ALL"=>{"this_work"=>{"111$t"=>{}, "none"=>{}}, "names"=>{"none"=>{}}}}]


[["001", {"UNC"=>{"oclc_number"=>{"complex contraint logic"=>{}}, "sersol_number"=>{"value =~ /^ss([ej]|[ie]b)\\d+$/"=>{}}, "vendor_marc_id"=>{"value does not meet criteria to be OCLC Number or Sersol Number"=>{}}}}], ["010", {"ALL"=>{"misc_id"=>{"none"=>{}}}}], ["100", {"ALL"=>{"this_work"=>{"100$t"=>{}, "none"=>{}}, "names"=>{"none"=>{}}}}], ["110", {"ALL"=>{"this_work"=>{"110$t"=>{}, "none"=>{}}, "names"=>{"none"=>{}}}}], ["111", {"ALL"=>{"this_work"=>{"111$t"=>{}, "none"=>{}}, "names"=>{"none"=>{}}}}]]

Then we set up the ARGOT ELEMENT level structure for population...

In [225]:
fix_tags.each do |m|
  map_hash[m[4]][m[3]][m[0]][m[6]][m[1]] = [] unless map_hash[m[4]][m[3]][m[0]][m[6]].has_key?(m[1])
end
map_hash.first(5).each { |h| puts h }

["001", {"UNC"=>{"oclc_number"=>{"complex contraint logic"=>{"oclc_number[value]"=>[]}}, "sersol_number"=>{"value =~ /^ss([ej]|[ie]b)\\d+$/"=>{"sersol_number"=>[]}}, "vendor_marc_id"=>{"value does not meet criteria to be OCLC Number or Sersol Number"=>{"vendor_marc_id"=>[]}}}}]
["010", {"ALL"=>{"misc_id"=>{"none"=>{"misc_id[value]"=>[]}}}}]
["100", {"ALL"=>{"this_work"=>{"100$t"=>{"this_work[title]"=>[]}, "none"=>{"this_work[author]"=>[]}}, "names"=>{"none"=>{"names[name]"=>[], "names[rel]"=>[], "names[type]"=>[]}}}}]
["110", {"ALL"=>{"this_work"=>{"110$t"=>{"this_work[title]"=>[]}, "none"=>{"this_work[author]"=>[]}}, "names"=>{"none"=>{"names[name]"=>[], "names[rel]"=>[], "names[type]"=>[]}}}}]
["111", {"ALL"=>{"this_work"=>{"111$t"=>{"this_work[title]"=>[]}, "none"=>{"this_work[author]"=>[]}}, "names"=>{"none"=>{"names[name]"=>[], "names[rel]"=>[], "names[type]"=>[]}}}}]


[["001", {"UNC"=>{"oclc_number"=>{"complex contraint logic"=>{"oclc_number[value]"=>[]}}, "sersol_number"=>{"value =~ /^ss([ej]|[ie]b)\\d+$/"=>{"sersol_number"=>[]}}, "vendor_marc_id"=>{"value does not meet criteria to be OCLC Number or Sersol Number"=>{"vendor_marc_id"=>[]}}}}], ["010", {"ALL"=>{"misc_id"=>{"none"=>{"misc_id[value]"=>[]}}}}], ["100", {"ALL"=>{"this_work"=>{"100$t"=>{"this_work[title]"=>[]}, "none"=>{"this_work[author]"=>[]}}, "names"=>{"none"=>{"names[name]"=>[], "names[rel]"=>[], "names[type]"=>[]}}}}], ["110", {"ALL"=>{"this_work"=>{"110$t"=>{"this_work[title]"=>[]}, "none"=>{"this_work[author]"=>[]}}, "names"=>{"none"=>{"names[name]"=>[], "names[rel]"=>[], "names[type]"=>[]}}}}], ["111", {"ALL"=>{"this_work"=>{"111$t"=>{"this_work[title]"=>[]}, "none"=>{"this_work[author]"=>[]}}, "names"=>{"none"=>{"names[name]"=>[], "names[rel]"=>[], "names[type]"=>[]}}}}]]

Then we populate the hash with the details...

In [226]:
fix_tags.each do |m|
  this_mapping = { :subfields => m[5],
                   :processing_type => m[7],
                   :processing_inst => m[8],
                   :notes => m[9],
                   :provisional => m[10]}
  map_hash[m[4]][m[3]][m[0]][m[6]][m[1]] << this_mapping
end
map_hash.first(5).each { |h| puts h }
puts ''

["001", {"UNC"=>{"oclc_number"=>{"complex contraint logic"=>{"oclc_number[value]"=>[{:subfields=>"{whole field}", :processing_type=>"complex", :processing_inst=>"see notes", :notes=>"https://github.com/trln/marc-to-argot/blob/master/spec/unc_oclcnum_spec.rb", :provisional=>nil}]}}, "sersol_number"=>{"value =~ /^ss([ej]|[ie]b)\\d+$/"=>{"sersol_number"=>[{:subfields=>"{na}", :processing_type=>"subelement_to_value", :processing_inst=>"x", :notes=>"x", :provisional=>nil}]}}, "vendor_marc_id"=>{"value does not meet criteria to be OCLC Number or Sersol Number"=>{"vendor_marc_id"=>[{:subfields=>"{na}", :processing_type=>"subelement_to_value", :processing_inst=>"x", :notes=>"x", :provisional=>nil}]}}}}]
["010", {"ALL"=>{"misc_id"=>{"none"=>{"misc_id[value]"=>[{:subfields=>"a", :processing_type=>"subelement_to_value", :processing_inst=>"misc_id[qual]   = \"\";;;misc_id[type] = \"LCCN\"", :notes=>"x", :provisional=>nil}, {:subfields=>"b", :processing_type=>"subelement_to_value", :processing_inst

From this, we generate a human-readable report in HTML. Here, we create the HTML file and all the parts of it that come before the dump of data.

In [233]:
hfile = File.new("marc_mappings.html", "w")
hfile.puts('<HTML>')
hfile.puts('<HEAD>')
hfile.puts('<title>Mappings from MARC to Argot</title>')
hfile.puts("<style media='all' type='text/css'>")
hfile.puts("body {font-family: Helvetica Neue, sans-serif;}")
hfile.puts("h2 {text-indent: 1em;}")
hfile.puts("h3 {text-indent: 2em;}")
hfile.puts("h4 {text-indent: 3em;}")
hfile.puts("</style>")
hfile.puts('</HEAD>')
hfile.puts('<BODY>')
hfile.puts('<h1>Mappings from MARC to Argot</h1>')


Then we do a bunch of mess to translate the hash to HTML...

In [234]:
map_hash.each do |tag, insthash|
  hfile.puts("<h2>#{tag}</h2>")
  insthash.each do |inst, afieldhash|
    hfile.puts("<h3>Mappings for #{inst}</h3>")
    afieldhash.each do |afield, constrainthash|
      constrainthash.each do |constraint, aelementhash|
        constraintline = '<h4>'
        if constraint == 'none'
          constraintline << 'IN ALL CASES...</h4>'
        else
           constraintline << "<h4>WHEN #{constraint} THEN...</h4>"
      end
        hfile.puts constraintline
      aelementhash.each do |aelement, mappings|
        mappings.each do |m|
        line = '<p>'
        if m[:processing_type] == 'constant'
          line << "<b>#{afield}</b> #{m[:processing_inst]}"
        else
          line << "#{tag} subfields #{m[:subfields]} map to <b>#{aelement}</b>"
        end
        line << '</p>'
        hfile.puts line
        end
      end
      end
    end
  end
end

{"001"=>{"UNC"=>{"oclc_number"=>{"complex contraint logic"=>{"oclc_number[value]"=>[{:subfields=>"{whole field}", :processing_type=>"complex", :processing_inst=>"see notes", :notes=>"https://github.com/trln/marc-to-argot/blob/master/spec/unc_oclcnum_spec.rb", :provisional=>nil}]}}, "sersol_number"=>{"value =~ /^ss([ej]|[ie]b)\\d+$/"=>{"sersol_number"=>[{:subfields=>"{na}", :processing_type=>"subelement_to_value", :processing_inst=>"x", :notes=>"x", :provisional=>nil}]}}, "vendor_marc_id"=>{"value does not meet criteria to be OCLC Number or Sersol Number"=>{"vendor_marc_id"=>[{:subfields=>"{na}", :processing_type=>"subelement_to_value", :processing_inst=>"x", :notes=>"x", :provisional=>nil}]}}}}, "010"=>{"ALL"=>{"misc_id"=>{"none"=>{"misc_id[value]"=>[{:subfields=>"a", :processing_type=>"subelement_to_value", :processing_inst=>"misc_id[qual]   = \"\";;;misc_id[type] = \"LCCN\"", :notes=>"x", :provisional=>nil}, {:subfields=>"b", :processing_type=>"subelement_to_value", :processing_inst=

In [235]:
hfile.puts('</BODY>')
hfile.puts('</HTML>')
hfile.close