# MARC mappings
This notebook generates and allows you to view the mapping of MARC to the Argot data language used by TRLN Discovery.

First I need to bring in the mappings data from the Excel file that lives at https://github.com/trln/data-documentation/blob/master/argot/argot.xlsx 



In [23]:
require 'pp'
require 'simple_xlsx_reader'
doc = SimpleXlsxReader.open('../argot/argot.xlsx')
mappings_d = doc.sheets[1].data
mappings_h = doc.sheets[1].headers
puts ''





Then I want to see my headers along with the column number for each:

In [22]:
mappings_h.each { |h| puts "#{mappings_h.index(h)} -- #{h}"}
puts ''

0 -- parentfield
1 -- field
2 -- source data format
3 -- provisional?
4 -- institution
5 -- element/field
6 -- subelement/field(s)
7 -- constraints
8 -- processing_type
9 -- processing instructions
10 -- notes
11 -- mapping_id
12 -- mapping issue ct
13 -- field issue ct
14 -- field defined?
15 -- done in mta?
16 -- tests done?



Then, I throw out mappings that are not from standard MARC source data and take a look at the data.

In [35]:
marc_only = mappings_d.select{ |r| r[2] == "MARC"}
puts "Number of mappings: #{marc_only.size}"
marc_only.first(3).each { |m| puts m}
puts ''

Number of mappings: 669
["derived_work_note", "derived_work_note", "MARC", "n", "GEN", "581", "az3", "none", "see notes", "$3 label; prepend ISBN to $z value", "x", "derived_work_noteGEN581az3none", "0", "0", "n", nil, nil]
["donor", "donor[value]", "MARC", "n", "UNC", "790", "abcdgqu", "ind 1 = 0", "concat_subelements", "Prepend: \"Donated by \"", ".", "donor[value]UNC790abcdgquind 1 = 0", "0", "0", "n", "y", "y"]
["donor", "donor[value]", "MARC", "n", "UNC", "791", "abcdfg", "ind 1 = 2", "concat_subelements", "Prepend: \"Purchased using funds from the \"", ".", "donor[value]UNC791abcdfgind 1 = 2", "0", "0", "n", "y", "y"]



There's a lot of info here we don't need, so I'm going to keep only what we need: 
* 0 -- parentfield
* 1 -- field
* 3 -- provisional?
* 4 -- institution
* 5 -- element/field
* 6 -- subelement/field(s)
* 7 -- constraints

In [41]:
less_data = marc_only.map { |m| [m[0], m[1], m[3], m[4], m[5], m[6], m[7]]}
less_data.first(3).each { |m| puts m}
puts ''

["derived_work_note", "derived_work_note", "n", "GEN", "581", "az3", "none"]
["donor", "donor[value]", "n", "UNC", "790", "abcdgqu", "ind 1 = 0"]
["donor", "donor[value]", "n", "UNC", "791", "abcdfg", "ind 1 = 2"]



Sort by MARC tag and print out the first 20

In [43]:
less_data.sort! {|a,b| a[4] <=> b[4]}
less_data.first(20).each { |m| puts m }
puts ''

["oclc_number", "oclc_number[value]", "n", "UNC", "1", ".", "see notes"]
["sersol_number", "sersol_number", "n", "UNC", "1", "{na}", "value =~ /^ss([ej]|[ie]b)\\d+$/"]
["vendor_marc_id", "vendor_marc_id", "n", "UNC", "1", "{na}", "value does not meet criteria to be OCLC Number or Sersol Number"]
["misc_id", "misc_id[value]", "n", "GEN", "10", "a", "none"]
["misc_id", "misc_id[value]", "n", "GEN", "10", "b", "none"]
["misc_id", "misc_id[value]", "n", "GEN", "10", "z", "none"]
["this_work", "this_work[author]", "n", "GEN", "100", "abcd(g)jqu", "none"]
["names", "names[name]", "n", "GEN", "100", "abcdgjqu", "none"]
["names", "names[rel]", "n", "GEN", "100", "e4", "none"]
["this_work", "this_work[title]", "n", "GEN", "100", "f(g)hklnpt", "100$t"]
["names", "names[type]", "n", "GEN", "100", "{na}", "none"]
["this_work", "this_work[title]", "n", "GEN", "110", "(d)f(g)kl(n)pt", "110$t"]
["this_work", "this_work[author]", "n", "GEN", "110", "abc(d)(g)(n)u", "none"]
["names", "names[name]", "n"

Our MARC tags beginning with zeros need to be padded out, so let's fix that. 

In [44]:
fix_tags = less_data.each { |m| m[4] = m[4].rjust(3,'0') }
fix_tags.first(5).each { |m| puts m}
puts ''

["oclc_number", "oclc_number[value]", "n", "UNC", "001", ".", "see notes"]
["sersol_number", "sersol_number", "n", "UNC", "001", "{na}", "value =~ /^ss([ej]|[ie]b)\\d+$/"]
["vendor_marc_id", "vendor_marc_id", "n", "UNC", "001", "{na}", "value does not meet criteria to be OCLC Number or Sersol Number"]
["misc_id", "misc_id[value]", "n", "GEN", "010", "a", "none"]
["misc_id", "misc_id[value]", "n", "GEN", "010", "b", "none"]



We get rid of mappings where a constant value is provided.

In [30]:
no_constants = marc_only.reject{ |r| r[5] == 'na'}
puts "Number of mappings: #{no_constants.size}"

Number of mappings: 669
