Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dspace harvester first version #4

Merged
merged 6 commits into from May 2, 2022
Merged

Conversation

joseph6x
Copy link
Collaborator

Dspace harvester first version

Copy link

@chenejac chenejac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joseph6x nice work. I have posted some comments, please read it and consider addressing some of them before you merge your PR to the main branch. Thanks for the nice work.

<artifactId>org.vivoweb.dspacevivo.model.openapi</artifactId>
<version>0.0.1-SNAPSHOT</version>
</dependency>
</dependencies>
</project>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add new line at the end of file


public abstract void connect();
public abstract Iterator<Item> harvestItems();
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new line at the end of file is missing

}
}
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new line

Comment on lines 137 to 142






Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multiple empty lines

Comment on lines 36 to 37
System.out.println("ºººººººººººººº");
System.out.println(origin.getConf().getProperty("uriPrefix"));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use logs, not printing in console.

import java.util.Optional;
import org.vivoweb.dspacevivo.model.Item;

public class ItemItr implements Iterator<Item> {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why do we need this class, but anyway its implementation should be improved. Beginning of hasNext is confusing, and the reason for that is overriding of next() method. List is also iterable, can we use next of that, and not removing elements from the list. Or it is implemented as that for the need of cleaning memory consumption (OAI supports resumption token). However, please consider redesigning of this class.

this.ignoreSSLWarnings = ignoreSSLWarnings;
}

}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new line

}

public Optional<String> getResumptionToken() {
Optional<String> vl = Optional.empty();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is v1 abbreviation for?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

verb number 1???

</xsl:for-each>
</oai_dc>
</xsl:template>
</xsl:stylesheet>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new line

Copy link

@chenejac chenejac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joseph6x please check my comments. Well done!!!

}

@Override
public Community next() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I don't like how this is implemented.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To get the data from Dspace multiple OAI requests have to be performed. Each OAI response returns a list (page) of elements e.g. Items, Communities, Collections. In these Iterator classes I am converting the responses from pages to a stream of elements, so the complexity of pagination in the OAI requests are hidden behind the Iterator class. this is the reason why I am using a temporary list to store the results of the current page and in the hasNext method the OAI requests are executed when the temporary list is empty. Please let me know if you have suggestions on how to improve it

while (harvestItemsItr.hasNext()) {
count++;
Item next = harvestItemsItr.next();
System.out.println("new Item harvested...");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use loggers instead of printing to console?

@joseph6x
Copy link
Collaborator Author

joseph6x commented May 2, 2022

Hi @chenejac . I have just pushed the last version of the harvester. Please let me know if I can merge the PR.

@chenejac chenejac merged commit fbff7f4 into vivo-community:main May 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants