Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Web of Science Tagged: More robust against trailing space characters. #3053

Conversation

zoe-translates
Copy link
Collaborator

If there are trailing whitespace characters after the EF "tag", the script fails because it interprets "EF" as an unknown tag rather than end of file.

This is fixed by interpreting EF as intended, see the table of export field tags at Clarivate:
https://webofscience.help.clarivate.com/en-us/Content/export-records.htm#mc-dropdown-body6dc16a03-4cb8-43e1-b72a-ff5a3197f8f0

Bug reported by gxy1932582, see
https://forums.zotero.org/discussion/comment/436794/#Comment_436794

If there are trailing whitespace characters after the EF "tag", the
script fails because it interprets "EF" as an unknown tag rather than
end of file.

This is fixed by interpreting EF as intended, see the table of export
field tags at Clarivate:
https://webofscience.help.clarivate.com/en-us/Content/export-records.htm#mc-dropdown-body6dc16a03-4cb8-43e1-b72a-ff5a3197f8f0

Bug reported by gxy1932582, see
https://forums.zotero.org/discussion/comment/436794/#Comment_436794
@zoe-translates
Copy link
Collaborator Author

Note that the check failed (timed out) because the Selenium test doesn't run for non-Web translators. (zotero/zotero-connectors#429). The fix is a bit complicated, and it needs to be done in multiple places.

@@ -229,6 +229,11 @@ function doImport(text) {
while ((rawLine = Zotero.read()) !== false) { // until EOF
//Z.debug("line: " + rawLine);
let split = rawLine.match(/^([A-Z0-9]{2})\s(?:([^\n]*))?/);
// EF is equivalent to the end of file.
if (/^EF\s*/.test(rawLine)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The \s* isn't doing anything here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right. I probably meant /^EF\s*$/ which includes EF or EF but not EFF or something. But even this may be unnecessary? Simply terminating on rawLine.startsWith("EF") should be alright? There are no three-letter tags yet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think that's fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least assuming every line is guaranteed to start with a tag… How are multi-line fields (e.g., multi-line notes) handled?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason not to do /^EF\s*$/, I suppose.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are multi-line fields (e.g., multi-line notes) handled?

The 2nd line in a multi-line value will start with three spaces. Same with "array" values such as list of authors. So a wrapped line in the abstract that happens to start with "EF", or an author name like EFSTATHIOU will not be confused with end-of-file (in a properly formatted file).

@@ -229,6 +229,11 @@ function doImport(text) {
while ((rawLine = Zotero.read()) !== false) { // until EOF
//Z.debug("line: " + rawLine);
let split = rawLine.match(/^([A-Z0-9]{2})\s(?:([^\n]*))?/);
// EF is equivalent to the end of file.
if (/^EF\s*/.test(rawLine)) {
Z.debug("Encountered EF (equivalent to End of File)");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment out

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah, right, there's already a debug message.

- Remove comment that doesn't add to info.
- Make the EF short-circuiting next to the while condition.
@zoe-translates
Copy link
Collaborator Author

There's more to this error the user is encountering. There's a lot more ways completeItem() may fail because of missing checks.

@zoe-translates
Copy link
Collaborator Author

To be honest, the logic of the existing code is a bit baffling to me. I think it's worthwhile to do first improve the underlying structure of the code before fixing these errors that arose from the lack of robustness of that structure. This is continued in #3062.

I am going to close this one because the effort here has been superseded. Thank you @dstillman for the comments -- those were not in vain, because the ideas will turn out in different ways in the new PR.

@dstillman
Copy link
Member

trimEnd() isn't available in Zotero 6, so we can't use it yet

cc7c3c7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants