Add Libraries Tasmania translator #2832

wragge · 2022-06-13T07:42:32Z

This is a translator for Libraries Tasmania. The catalogue uses SirsiDynix, but none of the existing translators worked and the system has been customised to include a range of important archival material. There's actually three catalogues rolled into one:

the Library catalogue for all the normal library stuff (but also including things like archived websites)
the Archives catalogue
the Name Index catalogue -- basically transcribed entries from certain archive collections

The translator captures data from all three sections. Many of the records in the Library section have MARC views, so I was able to use the existing MARC translator for those. Some elements are generated by JS, so they took a bit of tracking down. The digital files are rendered and linked to in a variety of different ways. I tried many different solutions, but this one seems to work ok in most cases. The main limit is that if a record links to a multi-image viewer, without a reference to a specific page, no images are saved. But if it's a single image viewer, or there's a link identifying the page, the image/file will be saved -- both images and PDFs. In some cases, such as the Convict Name index, you get multiple image attachments which is pretty cool.

…abels. Update tests.

AbeJellinek · 2022-06-16T23:09:52Z

Thanks, this is awesome work! That said... I think the issue might just be that we don't have a (working?) translator that targets SIRSI Dynix. This site looks similar, as does this one and this one. None of them are detected by any translator on my end. Could you test out if this translator can target other Dynix sites? If so, let's just make it more general.

wragge · 2022-06-17T01:04:07Z

Good idea, but I think the Libraries Tasmania catalogue is too heavily customized for the translator to work with more generic SirsiDynix systems. Looking at the two examples you mentioned, there are some major differences:

The links on items in the search results don't include any item identifiers. To get a link to an item page I think you'd have to get an identifier from the 'Place hold' button (or somewhere else) and build a url. The links in Libraries Tas go to the item page.
The other systems don't seem to provide the MARC view that Libraries Tas includes in many of the bib records.
It looks like most of the field/class names are different in the item records, so you'd need another set of selectors anyway.

On top of that Libraries Tas adds permalinks, digitised items, and separate custom catalogues for archival material. None of which seem to be standard.

From a quick poke around, it looks like a translator could be created to work with the more generic SirsiDynix systems you've linked to, but I don't think there'd be much overlap with the Libraries Tas translator.

wragge · 2022-06-17T01:08:02Z

Ah looking closer it seems you can get the item identifiers from the onclick event on the titles, but it's still different to Libraries Tas which has the full item url in the href.

wragge · 2022-06-21T05:25:32Z

Is there anything else you'd like me to do?

AbeJellinek

Some comments:

AbeJellinek · 2022-06-30T18:42:55Z

Libraries Tasmania.js

+	if (url.includes("results") && getSearchResults(doc, true)) {
+		return "multiple";
+	}
+	else if (catType == "tas" || url.includes("ARCHIVES_")) {


catType == might be a little sketchy - String#match returns an array, or null if there was no match. Instead, how about catType && catType[1] == "tas" (and similar for the check below)?

AbeJellinek · 2022-06-30T18:43:31Z

Libraries Tasmania.js

+		return "manuscript";
+	}
+	else {
+		var formats = doc.querySelectorAll("div.displayElementText.text-p.LOCAL_FORMAT");


Are all these classes necessary or would just ".LOCAL_FORMAT" work? For stability if the site changes.

The .LOCAL_FORMAT class is also used in the field label, but simplifying to .displayElementText.LOCAL_FORMAT seems ok.

AbeJellinek · 2022-06-30T18:44:19Z

Libraries Tasmania.js

+			if (!items) {
+				return true;
+			}
+			var articles = [];
+			for (var i in items) {
+				articles.push(i);
+			}
+			ZU.processDocuments(articles, scrape);
+			return false;


These days this can be simplified to

Suggested change

if (!items) {

return true;

}

var articles = [];

for (var i in items) {

articles.push(i);

}

ZU.processDocuments(articles, scrape);

return false;

if (!items) return;

ZU.processDocuments(Object.keys(items), scrape);

AbeJellinek · 2022-06-30T18:45:57Z

Libraries Tasmania.js

+}
+
+function getFieldText(doc, label) {
+	let fields = doc.querySelectorAll("div.displayElementText.text-p." + label);


Same question about the selectors here - it seems like we could at least drop the div tag selector, and maybe .displayElementText, since that seems like a more visual and less semantic class name and is liable to change. We could avoid false positives by running querySelectorAll() on a smaller scope instead of the full document, like the main metadata table element if there is a consistent one.

As above, I've kept .displayElementText to distinguish from the label, but removed the rest.

AbeJellinek · 2022-06-30T18:47:02Z

Libraries Tasmania.js

+
+function getLinkLists(doc, label, idx) {
+	var values = [];
+	let links = doc.querySelectorAll("div." + label + " a[href*='" + idx + "']");


Is the div necessary here? (Sorry for being annoying about these!)

Removed div

AbeJellinek · 2022-06-30T18:50:36Z

Libraries Tasmania.js

+		let physParts = physDesc.split(";");
+		if (physParts.length == 2 && format == "book") {
+			item.numPages = cleanText(physParts[0]);
+			item["physical desription"] = cleanText(physParts[1]);


AbeJellinek · 2022-06-30T18:51:28Z

Libraries Tasmania.js

+	var item = new Zotero.Item("manuscript");
+
+	// Types should be Agency, Series, or Item
+	let typeLabel = ZU.trim(doc.querySelector(".T245_DISPLAY_label").textContent).replace(/:$/, "");


Stuff like this should use text(doc, ...) instead of querySelector(...).textContent for null-safety:

Suggested change

let typeLabel = ZU.trim(doc.querySelector(".T245_DISPLAY_label").textContent).replace(/:$/, "");

let typeLabel = ZU.trim(text(doc, ".T245_DISPLAY_label")).replace(/:$/, "");

Where does text() come from?

From here: https://github.com/zotero/translate/blob/e08e318997b80685f03fbc4aae2cf6944fe46cfc/src/translation/translate.js#L1879-L1896

It's added to the translation sandbox for web and search translators. There's also innerText, attr (same idea but takes an attribute name as the last argument and returns the value of that attribute on the matched element, or the empty string), and now request (async HTTP).

Changed in this instance and in a couple of other places.

AbeJellinek · 2022-06-30T18:52:21Z

Libraries Tasmania.js

+	];
+	// Add field values to Extras
+	for (let i = 0; i < fields.length; ++i) {
+		let label = doc.querySelector("div." + fields[i] + "_label");


Selector nitpick!

div removed

AbeJellinek · 2022-06-30T18:52:29Z

Libraries Tasmania.js

+	addDigitalFiles(doc, item);
+}
+
+// Zotero.debug(item);


Can remove this

AbeJellinek · 2022-06-30T18:53:14Z

Libraries Tasmania.js

+				"abstractNote": "Official minutes of meetings. | These records are part of the holdings of the Tasmanian Archives",
+				"callNumber": "AD940",
+				"libraryCatalog": "Libraries Tasmania",
+				"manuscriptType": "Series",


I don't know if these really belong in manuscriptType... Maybe a line in extra?

These are the type of archival entity being described -- series, item, or agency -- so I think manuscriptType is the best place for it. Having it there also makes it much easier to browse/filter different types in the Zotero interface. This is also consistent with other translators including National Archives of Australia and State Records Office of WA.

I'm inclined to agree, but to make sure -- is this expected to be part of citations, typically? manuscriptType aka CSL's genre is used a fair amount in citation styles, so this would show up regularly. If that makes sense, I think this is definitely the right field. If it doesn't, I'd reconsider.

I've made a couple of changes. I've kept manuscriptType to indicate the archival entity type ('series', 'item' etc). This is consistent with other translators (those noted above plus the UK National Archives) and is most likely to be used in citations. I've also lowercased these values for consistency.

I was also using manuscriptType for the type of index for names index records. I've changed these so the manuscriptType is now 'item', and the index title is saved in 'Extras'.

AbeJellinek · 2022-06-30T18:59:30Z

And makes sense about the scope. We can do a different translator, or expand the existing SIRSI one, for other sites that this can't cover.

wragge · 2022-07-07T01:21:44Z

Hope it's right to go now, want to give it a shout out in a talk I'm giving tomorrow!

AbeJellinek · 2022-07-13T16:22:43Z

Ack. I'm really sorry I didn't get to this in time. Looks good to go. I am curious why some of the publisher and label fields aren't getting split into place, publisher/label, and date, since the regex for that looks good, but we can work on that later.

wragge added 5 commits June 12, 2022 17:25

Add Libraries Tasmania

a029016

More tests

e10d86a

Digital images now found when saving multiple records

2131ec6

Add urls to items without permalinks. Fix for missing digitsed file l…

1a88987

…abels. Update tests.

Adjustments to permalinks

657f0be

Attach snapshots

0f4904d

AbeJellinek requested changes Jun 30, 2022

View reviewed changes

wragge added 2 commits July 5, 2022 12:35

Make changes suggested by review

5c09071

Make manscriptType values more consistent

fd66c7b

AbeJellinek merged commit e3b6326 into zotero:master Jul 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Libraries Tasmania translator #2832

Add Libraries Tasmania translator #2832

wragge commented Jun 13, 2022

AbeJellinek commented Jun 16, 2022

wragge commented Jun 17, 2022

wragge commented Jun 17, 2022

wragge commented Jun 21, 2022

AbeJellinek left a comment

AbeJellinek Jun 30, 2022

wragge Jul 4, 2022

AbeJellinek Jun 30, 2022

wragge Jul 4, 2022

AbeJellinek Jun 30, 2022

wragge Jul 4, 2022

AbeJellinek Jun 30, 2022

wragge Jul 4, 2022

AbeJellinek Jun 30, 2022

wragge Jul 5, 2022

AbeJellinek Jun 30, 2022

wragge Jul 5, 2022

AbeJellinek Jun 30, 2022

wragge Jun 30, 2022

AbeJellinek Jun 30, 2022

wragge Jul 5, 2022

AbeJellinek Jun 30, 2022

wragge Jul 5, 2022

AbeJellinek Jun 30, 2022

wragge Jul 5, 2022

AbeJellinek Jun 30, 2022

wragge Jul 5, 2022

adam3smith Jul 5, 2022

wragge Jul 7, 2022

AbeJellinek commented Jun 30, 2022

wragge commented Jul 7, 2022

AbeJellinek commented Jul 13, 2022

	let typeLabel = ZU.trim(doc.querySelector(".T245_DISPLAY_label").textContent).replace(/:$/, "");
	let typeLabel = ZU.trim(text(doc, ".T245_DISPLAY_label")).replace(/:$/, "");

Add Libraries Tasmania translator #2832

Add Libraries Tasmania translator #2832

Conversation

wragge commented Jun 13, 2022

AbeJellinek commented Jun 16, 2022

wragge commented Jun 17, 2022

wragge commented Jun 17, 2022

wragge commented Jun 21, 2022

AbeJellinek left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AbeJellinek commented Jun 30, 2022

wragge commented Jul 7, 2022

AbeJellinek commented Jul 13, 2022