Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Browser: Search Page #46

Closed
2 of 4 tasks
cookiengineer opened this issue Oct 22, 2020 · 3 comments
Closed
2 of 4 tasks

Browser: Search Page #46

cookiengineer opened this issue Oct 22, 2020 · 3 comments
Assignees

Comments

@cookiengineer
Copy link
Member

cookiengineer commented Oct 22, 2020

The stealth:search Page needs an Online and Offline search integration.

For now, the following search engines seem promising when it comes to their APIs that do not require tokens and/or user-specific authentication information in order to use them:

  • wiby.me can be integrated with a simple JSON request to https://wiby.me/json/?q=key%20words&o=15 whereas the first result page doesn't need an o=... parameter. The results are returned back in batches of 15 results.

  • searx.me (and all instances) has actually a very nice API that's documented well [1] and also allows json as a response format via https://searx.xyz/search?q=key%20words&format=json. The results are returned in pages and the pageno parameter accepts 1 or higher numbers. But, if no results are returned, the JSON is basically an empty array. There's seemingly no way to find out whether or not page 1 includes all found results or not.

  • searx integration might need something like an engines list that is a comma separated parameter in the request url. The list is pretty huge, but is also documented [2]

  • The Web Archive API is currently totally unclear, because there seems to be only outdated information about it. This probably needs some investigation about the source code that's being used on web.archive.org.

[1] Search API
[2] Search Engines

@cookiengineer cookiengineer added this to the X0 - Codename Spirit milestone Oct 22, 2020
@cookiengineer
Copy link
Member Author

The list of searx instances (that is available on https://searx.space) is also available as a json file under the URL https://searx.space/data/instances.json.

@2075
Copy link

2075 commented Oct 22, 2020

i did not know about searx, awesome! should be integrated on OS level

@cookiengineer
Copy link
Member Author

cookiengineer commented Oct 25, 2020

After investigating this for two days, the Search Page has been implemented in a rudimentary manner.

The Web Archive's advancedsearch.php does not allow to search for keywords, only for specific URLs. The normal search.php would theoretically support a keyword search, but can only return multiple MB of HTML code. So for now, on the Search Page, the Web Archive API is useless.

The wiby.me API seems to be rate-limited and doesn't accept requests without a faked User-Agent string, which seems kind of weird. This needs some further investigation in future, but for now the searx-integrated results are good enough.

However, the redirect of https://web.archive.org/*/<complete url> can be easily used to identify whether or not there's a web archived version of the page available. The issue for this is #19 (stealth:fix-request Page).

@cookiengineer cookiengineer self-assigned this Oct 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants