-
Notifications
You must be signed in to change notification settings - Fork 743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OSF Preprint: fix broken detection for multiple search results; unify search-result & individual-article scraping #3162
base: master
Are you sure you want to change the base?
Commits on Oct 19, 2023
-
OSF: Update search page URLs/selectors; be less eager in monitoring DOM
- Update search-result page URL pattern and selectors for search-result entries. - Be less eager in monitoring DOM change; do this only when the page could possibly be a search-result page as determined by URL. - Update a test case due to change in canonical URL field in output.
Configuration menu - View commit details
-
Copy full SHA for 73bc386 - Browse repository at this point
Copy the full SHA 73bc386View commit details -
OSF: Unify the scraping of search results and individual preprints
- Don't use a separate code path for scraping the current page; use the same scrape() function for both search results and current page - Asyncify the network requests - Clean up any HTML entities in the API-returned text fields (title, abstract, etc.) - More reliable way to extract the "id" of individual preprints: it's the last segment in the path - Overall reduction of code duplication - Update and add tests
Configuration menu - View commit details
-
Copy full SHA for 08635c5 - Browse repository at this point
Copy the full SHA 08635c5View commit details -
OSF: Add Accept: header to API request just in case
The API endpoint may respond with HTML (for human consumption) depending on a variety of factors. To prevent this, explicitly add "Accept:" header to the request.
Configuration menu - View commit details
-
Copy full SHA for 76929f9 - Browse repository at this point
Copy the full SHA 76929f9View commit details
Commits on Oct 27, 2023
-
Configuration menu - View commit details
-
Copy full SHA for dce6216 - Browse repository at this point
Copy the full SHA dce6216View commit details
Commits on Oct 28, 2023
-
OSF Preprints: Update metadata fields for attachment and authors
- The PDF attachment is named "Preprint PDF" - Use `ZU.cleanAuthor()` to normalize author name in consistency with other translators
Configuration menu - View commit details
-
Copy full SHA for 78fbbb5 - Browse repository at this point
Copy the full SHA 78fbbb5View commit details -
OSF Preprints: Simplify target/identification regexes
The domain osf.io is now the host of the discipline-specific projects, so we only need to match that.
Configuration menu - View commit details
-
Copy full SHA for 185e3b1 - Browse repository at this point
Copy the full SHA 185e3b1View commit details -
Configuration menu - View commit details
-
Copy full SHA for 1351404 - Browse repository at this point
Copy the full SHA 1351404View commit details