Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSF Preprint: fix broken detection for multiple search results; unify search-result & individual-article scraping #3162

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Commits on Oct 19, 2023

  1. OSF: Update search page URLs/selectors; be less eager in monitoring DOM

    - Update search-result page URL pattern and selectors for search-result
      entries.
    - Be less eager in monitoring DOM change; do this only when the page
      could possibly be a search-result page as determined by URL.
    - Update a test case due to change in canonical URL field in output.
    zoe-translates committed Oct 19, 2023
    Configuration menu
    Copy the full SHA
    73bc386 View commit details
    Browse the repository at this point in the history
  2. OSF: Unify the scraping of search results and individual preprints

    - Don't use a separate code path for scraping the current page; use the
      same scrape() function for both search results and current page
    - Asyncify the network requests
    - Clean up any HTML entities in the API-returned text fields (title,
      abstract, etc.)
    - More reliable way to extract the "id" of individual preprints: it's
      the last segment in the path
    - Overall reduction of code duplication
    - Update and add tests
    zoe-translates committed Oct 19, 2023
    Configuration menu
    Copy the full SHA
    08635c5 View commit details
    Browse the repository at this point in the history
  3. OSF: Add Accept: header to API request just in case

    The API endpoint may respond with HTML (for human consumption) depending
    on a variety of factors. To prevent this, explicitly add "Accept:"
    header to the request.
    zoe-translates committed Oct 19, 2023
    Configuration menu
    Copy the full SHA
    76929f9 View commit details
    Browse the repository at this point in the history

Commits on Oct 27, 2023

  1. Configuration menu
    Copy the full SHA
    dce6216 View commit details
    Browse the repository at this point in the history

Commits on Oct 28, 2023

  1. OSF Preprints: Update metadata fields for attachment and authors

    - The PDF attachment is named "Preprint PDF"
    - Use `ZU.cleanAuthor()` to normalize author name in consistency with
      other translators
    zoe-translates committed Oct 28, 2023
    Configuration menu
    Copy the full SHA
    78fbbb5 View commit details
    Browse the repository at this point in the history
  2. OSF Preprints: Simplify target/identification regexes

    The domain osf.io is now the host of the discipline-specific projects,
    so we only need to match that.
    zoe-translates committed Oct 28, 2023
    Configuration menu
    Copy the full SHA
    185e3b1 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    1351404 View commit details
    Browse the repository at this point in the history