-
Notifications
You must be signed in to change notification settings - Fork 749
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Libraries Tasmania translator #2832
Conversation
Thanks, this is awesome work! That said... I think the issue might just be that we don't have a (working?) translator that targets SIRSI Dynix. This site looks similar, as does this one and this one. None of them are detected by any translator on my end. Could you test out if this translator can target other Dynix sites? If so, let's just make it more general. |
Good idea, but I think the Libraries Tasmania catalogue is too heavily customized for the translator to work with more generic SirsiDynix systems. Looking at the two examples you mentioned, there are some major differences:
On top of that Libraries Tas adds permalinks, digitised items, and separate custom catalogues for archival material. None of which seem to be standard. From a quick poke around, it looks like a translator could be created to work with the more generic SirsiDynix systems you've linked to, but I don't think there'd be much overlap with the Libraries Tas translator. |
Ah looking closer it seems you can get the item identifiers from the onclick event on the titles, but it's still different to Libraries Tas which has the full item url in the href. |
Is there anything else you'd like me to do? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments:
Libraries Tasmania.js
Outdated
if (url.includes("results") && getSearchResults(doc, true)) { | ||
return "multiple"; | ||
} | ||
else if (catType == "tas" || url.includes("ARCHIVES_")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
catType ==
might be a little sketchy - String#match
returns an array, or null if there was no match. Instead, how about catType && catType[1] == "tas"
(and similar for the check below)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, done.
Libraries Tasmania.js
Outdated
return "manuscript"; | ||
} | ||
else { | ||
var formats = doc.querySelectorAll("div.displayElementText.text-p.LOCAL_FORMAT"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are all these classes necessary or would just ".LOCAL_FORMAT"
work? For stability if the site changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The .LOCAL_FORMAT
class is also used in the field label, but simplifying to .displayElementText.LOCAL_FORMAT
seems ok.
Libraries Tasmania.js
Outdated
if (!items) { | ||
return true; | ||
} | ||
var articles = []; | ||
for (var i in items) { | ||
articles.push(i); | ||
} | ||
ZU.processDocuments(articles, scrape); | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These days this can be simplified to
if (!items) { | |
return true; | |
} | |
var articles = []; | |
for (var i in items) { | |
articles.push(i); | |
} | |
ZU.processDocuments(articles, scrape); | |
return false; | |
if (!items) return; | |
ZU.processDocuments(Object.keys(items), scrape); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, done.
Libraries Tasmania.js
Outdated
} | ||
|
||
function getFieldText(doc, label) { | ||
let fields = doc.querySelectorAll("div.displayElementText.text-p." + label); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question about the selectors here - it seems like we could at least drop the div
tag selector, and maybe .displayElementText
, since that seems like a more visual and less semantic class name and is liable to change. We could avoid false positives by running querySelectorAll()
on a smaller scope instead of the full document, like the main metadata table element if there is a consistent one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, I've kept .displayElementText
to distinguish from the label, but removed the rest.
Libraries Tasmania.js
Outdated
|
||
function getLinkLists(doc, label, idx) { | ||
var values = []; | ||
let links = doc.querySelectorAll("div." + label + " a[href*='" + idx + "']"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the div
necessary here? (Sorry for being annoying about these!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed div
Libraries Tasmania.js
Outdated
let physParts = physDesc.split(";"); | ||
if (physParts.length == 2 && format == "book") { | ||
item.numPages = cleanText(physParts[0]); | ||
item["physical desription"] = cleanText(physParts[1]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spelling
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected
Libraries Tasmania.js
Outdated
var item = new Zotero.Item("manuscript"); | ||
|
||
// Types should be Agency, Series, or Item | ||
let typeLabel = ZU.trim(doc.querySelector(".T245_DISPLAY_label").textContent).replace(/:$/, ""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stuff like this should use text(doc, ...)
instead of querySelector(...).textContent
for null-safety:
let typeLabel = ZU.trim(doc.querySelector(".T245_DISPLAY_label").textContent).replace(/:$/, ""); | |
let typeLabel = ZU.trim(text(doc, ".T245_DISPLAY_label")).replace(/:$/, ""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does text()
come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's added to the translation sandbox for web and search translators. There's also innerText
, attr
(same idea but takes an attribute name as the last argument and returns the value of that attribute on the matched element, or the empty string), and now request
(async HTTP).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed in this instance and in a couple of other places.
Libraries Tasmania.js
Outdated
]; | ||
// Add field values to Extras | ||
for (let i = 0; i < fields.length; ++i) { | ||
let label = doc.querySelector("div." + fields[i] + "_label"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Selector nitpick!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
div
removed
Libraries Tasmania.js
Outdated
addDigitalFiles(doc, item); | ||
} | ||
|
||
// Zotero.debug(item); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can remove this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Libraries Tasmania.js
Outdated
"abstractNote": "Official minutes of meetings. | These records are part of the holdings of the Tasmanian Archives", | ||
"callNumber": "AD940", | ||
"libraryCatalog": "Libraries Tasmania", | ||
"manuscriptType": "Series", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if these really belong in manuscriptType
... Maybe a line in extra
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are the type of archival entity being described -- series, item, or agency -- so I think manuscriptType
is the best place for it. Having it there also makes it much easier to browse/filter different types in the Zotero interface. This is also consistent with other translators including National Archives of Australia and State Records Office of WA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm inclined to agree, but to make sure -- is this expected to be part of citations, typically? manuscriptType
aka CSL's genre
is used a fair amount in citation styles, so this would show up regularly. If that makes sense, I think this is definitely the right field. If it doesn't, I'd reconsider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made a couple of changes. I've kept manuscriptType
to indicate the archival entity type ('series', 'item' etc). This is consistent with other translators (those noted above plus the UK National Archives) and is most likely to be used in citations. I've also lowercased these values for consistency.
I was also using manuscriptType
for the type of index for names index records. I've changed these so the manuscriptType
is now 'item', and the index title is saved in 'Extras'.
And makes sense about the scope. We can do a different translator, or expand the existing SIRSI one, for other sites that this can't cover. |
Hope it's right to go now, want to give it a shout out in a talk I'm giving tomorrow! |
Ack. I'm really sorry I didn't get to this in time. Looks good to go. I am curious why some of the |
This is a translator for Libraries Tasmania. The catalogue uses SirsiDynix, but none of the existing translators worked and the system has been customised to include a range of important archival material. There's actually three catalogues rolled into one:
The translator captures data from all three sections. Many of the records in the Library section have MARC views, so I was able to use the existing MARC translator for those. Some elements are generated by JS, so they took a bit of tracking down. The digital files are rendered and linked to in a variety of different ways. I tried many different solutions, but this one seems to work ok in most cases. The main limit is that if a record links to a multi-image viewer, without a reference to a specific page, no images are saved. But if it's a single image viewer, or there's a link identifying the page, the image/file will be saved -- both images and PDFs. In some cases, such as the Convict Name index, you get multiple image attachments which is pretty cool.