# Unmapped MARC fields and subfields
This notebook:
* Documents MARC fields and subfields omitted from Argot mappings
* Identifies MARC mapping decisions to be made, by analyzing:
 * current MARC specification from http://www.loc.gov/marc/bibliographic/ecbdlist.html 
 * Argot mappings from https://github.com/trln/data-documentation/blob/master/argot/argot.xlsx
 * Our explicit field and subfield omission decisions 

MARC fields we have decided to exclude from Argot mappings are recorded in [`unmapped_marc_tags.json`](https://github.com/trln/data-documentation/blob/master/marc/unmapped_marc_tags.json).

Specific MARC subfields excluded from Argot mappings *when some subfields from the field ARE mapped* are recorded in [`unmapped_marc_subfields.json`](https://github.com/trln/data-documentation/blob/master/marc/unmapped_marc_subfields.json).

## Use the current MARC specification
First, run script to generate JSON hash of current MARC specification. If successful: 
* Out will = true
* `marc_definition.json` is written to `data-documentation/marc` folder

In [20]:
require './extract_marc_bib_spec'

false

## Get all our mappings data
This next bit just reads in all the data we'll need: 
* `marc_spec` is the updated MARC specification created in the previous step
* `unmapped_fields` is our explicit decisions about MARC fields not mapped to Argot
* `unmapped_subfields` is our explicit decisions about individual MARC subfields not mapped to Argot. These subfields all occur in MARC fields that are otherwise mapped to Argot. 

In [21]:
marc_spec = JSON.parse(File.read('marc_definition.json'))
unmapped_fields = JSON.parse(File.read('unmapped_marc_tags.json'))
unmapped_subfields = JSON.parse(File.read('unmapped_marc_subfields.json'))

{"567"=>{"b"=>{"omission_type"=>"temporary", "omission_reason_category"=>"low priority", "omission_details"=>"UNC has no data in this subfield. Consider mapping to genre_headings and subject_genre if ever meaningfully populated"}}, "022"=>{"m"=>{"omission_type"=>nil, "omission_reason_category"=>nil, "omission_details"=>nil}, "y"=>{"omission_type"=>nil, "omission_reason_category"=>nil, "omission_details"=>nil}, "z"=>{"omission_type"=>nil, "omission_reason_category"=>nil, "omission_details"=>nil}}}

Next get our current MARC-to-Argot mappings from Argot spreadsheet and turn this into a hash we can use:

In [26]:
require 'simple_xlsx_reader'
doc = SimpleXlsxReader.open('../argot/argot.xlsx')
mappings_d = doc.sheets[1].data
mappings_h = doc.sheets[1].headers
marc_only = mappings_d.select{ |r| r[2] == "MARC"}.map{ |r| [r[5].rjust(3,'0'), r[6], r[3]] }
# Remove mappings for fixed fields and those where data is a constant
marc_only.reject!{ |r| r[0] =~ /(00[678]|\{na\})/}
# Change {na}, {varies}, period, space, or parentheses in subfield list to nothing
marc_only.map!{ |r| [r[0], r[1].gsub(/\{(na|varies)\}|[.() ]/, ''), r[2]]}
# Convert subfield list to array
marc_only.map!{ |r| [r[0], r[1].split(''), r[2]]}
# explode mappings list, so there's one array per subfield value
# orig: ["111", ["j", "4"], "n"]
# new: ["111", "j", "n"], ["111", 4", "n"]
flat_mappings = []
marc_only.each do |r|  
  r[1].each { |sf| flat_mappings << [r[0], sf, r[2]] }
end
mapping_hash = {}
flat_mappings.uniq!.each do |r|
  if mapping_hash.has_key?(r[0])
    mapping_hash[r[0]][r[1]] = r[2]
  else
    mapping_hash[r[0]] = {r[1] => r[2]}
  end
end
puts JSON.pretty_generate(mapping_hash)

{
  "581": {
    "a": "n",
    "z": "n",
    "3": "n"
  },
  "790": {
    "a": "n",
    "b": "n",
    "c": "n",
    "d": "n",
    "g": "n",
    "q": "n",
    "u": "n"
  },
  "791": {
    "a": "n",
    "b": "n",
    "c": "n",
    "d": "n",
    "f": "n",
    "g": "n"
  },
  "250": {
    "3": "n",
    "a": "n",
    "b": "n"
  },
  "254": {
    "a": "n"
  },
  "310": {
    "a": "n",
    "b": "n"
  },
  "321": {
    "a": "n",
    "b": "n"
  },
  "382": {
    "a": "y",
    "b": "y",
    "d": "y",
    "p": "y"
  },
  "384": {
    "a": "y"
  },
  "567": {
    "b": "n",
    "a": "n"
  },
  "653": {
    "a": "n"
  },
  "655": {
    "a": "y",
    "v": "n",
    "x": "y",
    "y": "n",
    "z": "n"
  },
  "599": {
    "b": "y"
  },
  "260": {
    "3": "n",
    "a": "n",
    "b": "n",
    "c": "n",
    "e": "n",
    "f": "n",
    "g": "n"
  },
  "264": {
    "3": "n",
    "a": "n",
    "b": "n",
    "c": "n"
  },
  "700": {
    "a": "n",
    "b": "n",
    "c": "n",
    "d": "n",
    "g": "n",
    "j

```
"581": {
    "a": "n",
    "z": "n",
    "3": "n"
  }
```

Subfields a, z, and 3 from the 581 field are mapped to Argot. 

The "n" at the end of each subfield line means these are final mappings for production. "y" in this position indicates a provisional mapping, which may not yet be implemented or which is subject to major changes.

## Unmapped MARC, field tag level

The block below reports on the fields we've explicitly decided not to map.

In [23]:
unmapped_tag_table = { 'MARC tag' => [],
                       'Name' => [],
                       'Omission type' => [],
                       'Omission category' => [],
                       'Details' => []
                     }
unmapped_fields.sort.each do |tag, info|
  unmapped_tag_table['MARC tag'] << tag
  unmapped_tag_table['Name'] << marc_spec[tag]['name']
  unmapped_tag_table['Omission type'] << info['omission_type']
  unmapped_tag_table['Omission category'] << info['omission_reason_category']
  unmapped_tag_table['Details'] << info['omission_details']
end
IRuby.display IRuby.table(unmapped_tag_table, maxrows: 100)

MARC tag,Name,Omission type,Omission category,Details
306,PLAYING TIME,permanent,data unusable,"Usually duplicates data recorded in 300 and/or 505. Recorded in a cryptic manner and there's no way to specify which duration recorded goes with which part of the described item, if it's a multi-part thing"
307,"HOURS, ETC.",permanent,irrelevant,"No fields in UNC catalog. Hours an item is available is kind of a weird concept in the catalog, too."
336,CONTENT TYPE,temporary,low priority,Terms used are not user-friendly; this information is usually better expressed elsewhere in the record
337,MEDIA TYPE,temporary,low priority,Terms used are not user-friendly; this information is usually better expressed elsewhere in the record
338,CARRIER TYPE,temporary,low priority,Terms used are not user-friendly; this information is usually better expressed elsewhere in the record
342,GEOSPATIAL REFERENCE DATA,temporary,low priority,"Complex data mapping; appears in <2000 records in UNC catalog on 2018-05-21. Need to assess if this adds useful info, and if so, how to map."
343,PLANAR COORDINATE DATA,temporary,infrequently used,only 2 records in UNC catalog
348,FORMAT OF NOTATED MUSIC,temporary,low priority,Look into whether it should map to genre
355,SECURITY CLASSIFICATION CONTROL,permanent,irrelevant,No records in UNC catalog
357,ORIGINATOR DISSEMINATION CONTROL,permanent,irrelevant,No records in UNC catalog


#<CZTop::Socket::PUB:0x360ba70 last_endpoint="tcp://127.0.0.1:59867">

And the following lists all other **non-obsolete, non-local** unmapped MARC variable fields we haven't made an explicit decision about:

In [24]:
unmapped = {}
marc_spec.each do |tag, info|
  if mapping_hash.has_key?(tag)
    next
  elsif unmapped_fields.has_key?(tag)
    next
  elsif tag =~ /00[356789]/
    next
  elsif info['context'] =~ /OBSOLETE|LOCAL/
  else
    unmapped[tag] = {'name' => info['name'].strip,
                     'context' => info['context']
      }
  end
end

table2 = { 'MARC tag' => [],
           'Description' => [],
           'Note' => []
         }
unmapped.sort.each do |tag, info|
  table2['MARC tag'] << tag
  table2['Description'] << info['name']
  table2['Note'] << info['context']
end
IRuby.display IRuby.table(table2, maxrows: 100)

MARC tag,Description,Note
1,CONTROL NUMBER,
13,PATENT CONTROL INFORMATION,
16,NATIONAL BIBLIOGRAPHIC AGENCY CONTROL NUMBER,
17,COPYRIGHT OR LEGAL DEPOSIT NUMBER,
18,COPYRIGHT ARTICLE-FEE CODE,
25,OVERSEAS ACQUISITION NUMBER,
26,FINGERPRINT IDENTIFIER,
31,MUSICAL INCIPITS INFORMATION,
32,POSTAL REGISTRATION NUMBER,
33,DATE/TIME AND PLACE OF AN EVENT,


#<CZTop::Socket::PUB:0x360ba70 last_endpoint="tcp://127.0.0.1:59867">