marclojure is a library for - can you guess? - processing MARC records using Clojure. It can be used to serialize MARC records in ISO 2709 (MARC exchange format), MARCXML or Aleph Sequential formats into Clojure maps, process them and write them back to file. Writing is currently possible in MARCXML and Aleph Sequential, ISO 2709 is going to be supported very soon.
marclojure is available from Clojars. Add it to your project.clj
as follows:
[marclojure "1.0.6-SNAPHOT"]
Then you can require it into your namespace:
(ns foo.bar
(:require [marclojure.core :as marc]
[marclojure.parser :as parser]
[marclojure.writer :as writer]))
MARC batch files can be read into lazy sequences using the load-data
multimethod from marclojure.parser
namespace.
Load-data accepts two arguments: file format (keyword, possible options are :marc
, :marcxml
or :aleph
) and a filename.
In older versions of marclojure, system-specific fields (LOW, SID, FMT etc.) were not retained when parsing Aleph Sequential data. From 1.0.4 they are retained and can be optionally weeded by calling marclojure.core/remove-aleph-fields on the record.
An example:
(def dataset (parser/load-data :marc "somefile.mrc"))
=> #'foo.bar/dataset
Serialized records are represented as Clojure maps. The format looks as follows:
{:bibid "2"
:leader "01066cam a22003137i 4500"
:fields [{:type "controlfield", :tag "001", :data " 2"}
{:type "controlfield", :tag "005", :data "20120402125847.0"}
{:type "controlfield", :tag "008", :data "881209s1986 fr ||||||b |||||||eng||"}
{:type "datafield"
:tag "020"
:i1 " "
:i2 " "
:subfields [{:code "a", :data "9780306406157 (hbk.)"}]}
{:type "datafield", :tag "245"
:i1 "0"
:i2 "0"
:subfields [{:code "a", :data "Health education intervention :"}
{:code "b", :data "an annotated bibliography /"}
{:code "c", :data "Unesco Nutrition Education Programme ; Division of Science, Technical and Environmental Education, Unesco."}]}
{:type "datafield"
:tag "260"
:i1 " "
:i2 " "
:subfields [{:code "a", :data "Paris :"}
{:code "b", :data "Unesco,"}
{:code "c", :data "1986."}]}
{:type "datafield"
:tag "300"
:i1 " "
:i2 " "
:subfields [{:code "a", :data "103 sivua"}]}
{:type "datafield"
:tag "336"
:i1 " "
:i2 " "
:subfields [{:code "a", :data "teksti"}
{:code "b", :data "txt"}
{:code "2", :data "rdacontent"}]}
{:type "datafield"
:tag "337"
:i1 " "
:i2 " "
:subfields [{:code "a", :data "käytettävissä ilman laitetta"}
{:code "b", :data "n"}
{:code "2", :data "rdamedia"}]}
{:type "datafield"
:tag "338"
:i1 " "
:i2 " "
:subfields [{:code "a", :data "nide"}
{:code "b", :data "nc"}
{:code "2", :data "rdacarrier"}]}
{:type "datafield"
:tag "490"
:i1 "1"
:i2 " "
:subfields [{:code "a", :data "Nutrition education series ;"}
{:code "v", :data "13"}]}
{:type "datafield"
:tag "515"
:i1 " "
:i2 " "
:subfields [{:code "a", :data "Unesco doc. ED-86/WS/83."}]}
{:type "datafield"
:tag "650"
:i1 " "
:i2 "7"
:subfields [{:code "a", :data "terveydenhuolto"}
{:code "x", :data "bibliografia"}
{:code "2", :data "eks"}]}
{:type "datafield"
:tag "830"
:i1 " "
:i2 "0"
:subfields [{:code "a", :data "Nutrition education series ;"}
{:code "v", :data "13."}]}
{:type "datafield"
:tag "852"
:i1 " "
:i2 " "
:subfields [{:code "a", :data "FI-E"}
{:code "b", :data "IV.3."}
{:code "c", :data "Unesco 2-464"}]}]}
Apart from parsing MARC data, the marclojure.core
namespace provides some utility functions
for processing record sequences. Some examples (here the core namespace is loaded as marc
, see above).
(def batch (parser/load-data :marc "marcdata.mrc"))
=> #'foo.bar/batch
(def record (first batch))
=> #'foo.bar/record
(marc/print-to-repl record))
=>
"000 00000cam^a22004097i^4500
001 000000002
005 20160406135147.0
008 850308s1980^^^^sz^|||||||||||||||||fre||
041 0 $afre
080 $a696/697
080 $a296.63
080 $a929 Josephus
100 0 $aSzyszman, Simon.
245 13 $aLe karaïsme :$bses doctrines et son histoire /$cSimon Szyszman.
260 $aLausanne :$bL'Age d'Homme,$c1980.
300 $a247 s., 24 pl. :$bill., kart.
336 $ateksti$btxt$2rdacontent
337 $akäytettävissä ilman laitetta$bn$2rdamedia
338 $anide$bnc$2rdacarrier
490 1 $aBibliotheca karaitica. Series A ;$vvol. 1
650 7 $atalotekniikka$2ysa"
(-> record (marc/get-fields "245") first field-to-string)
=> "245 13 $aLe karaïsme :$bses doctrines et son histoire /$cSimon Szyszman."
(marc/get-subfields "245" "a" record)
=> ({:code "a", :data "La karaisme"})
(marc/print-to-file batch "outputfile.txt")
=> nil
(marc/print-ids-to-file batch "outputfile_ids.txt")
=> nil
(marc/record-contains-phrase? ["lausanne" "hard rock"] record)
=> true
(marc/contains-field? "130" record)
=> false
(marc/field-contains-phrase? "100" ["Simon"] record)
=> true
Writing records to file is done as follows:
(writer/write-data :marcxml batch "outputfile.xml")
=> nil
(writer/write-data :aleph batch "outputfile.seq")
=> nil
marclojure
uses marc4j for reading MARC data. Thanks for that!
Aleph Sequential parser is based on clj-marc.
Copyright © 2017-2021 Tuomo Virolainen
Distributed under the Eclipse Public License either version 1.0.