Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update DV to work properly for datasets #3024

Merged
merged 4 commits into from
May 20, 2023
Merged

Conversation

adam3smith
Copy link
Collaborator

Note that a bunch of the tests are currently timing out, so I updated them manually -- this appear to affect all current Dataverse installations (the ones that still run use older versions) so we want to keep the tests in there and hope that Z7 fixes this.

@adam3smith
Copy link
Collaborator Author

@AbeJellinek you created the original regex. There have since been ~30 additional installations.
Going by the tests and some clicking around, couldn't we get away with the much simpler /\/dataverse\/|\/file.xhtml\?|\/dataset.xhtml\?/, i.e. simply the part after the URL. That shouldn't be much more costly then the regex we're running now (I've also asked on their mailing list, but maybe you remember why you didn't go by that in the original version).

@adam3smith
Copy link
Collaborator Author

Nervermind -- the root collection/dataverse is the problem.
Here's an R script to check for unsupported collections

library(jsonlite)

dvInstants <- fromJSON("https://iqss.github.io/dataverse-installations/data/data.json")

dvInstants <- dvInstants$installations

dvRegex <- "^https?:\\/\\/(www\\.)?((open|research-?|hei|planetary-|osna|in|bonn|borealis|lida\\.|archaeology\\.|entrepot\\.recherche\\.|archive\\.|redape\\.)?(data|e?da[td]os)|dvn|sodha\\.be|repositorio(\\.|dedados|pesquisas)|darus|abacus\\.library\\.ubc\\.ca|dorel\\.univ-lorraine\\.fr|dunas\\.ua\\.pt|edmond\\.mpdl\\.mpg\\.de|keen\\.zih\\.tu-dresden\\.de|rdr\\.kuleuven\\.be|portal\\.odissei\\.nl)"


unsupportedInstance = list()
for (url in dvInstants$hostname) {
  url <- paste0("https://", url)
  if (!grepl(dvRegex, url)) {
    unsupportedInstance <- append(unsupportedInstance, url)
  }
} 

print(unsupportedInstance)

the resulting list should be empty (or lists all unsupported hosts)

@adam3smith adam3smith merged commit bcaa8b4 into zotero:master May 20, 2023
@adam3smith adam3smith deleted the Dataverse branch June 17, 2023 01:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant