add(Soopat.js): New translator for a patent site www.soopat.com #3007

l0o0 · 2023-03-28T04:48:37Z

Soopat.com is a patent database based in China that is widely used by Chinese users. It collects patents from all around the world, not just from China, and allows users to search and download patent applications and authorization PDFs. Pro users can access more detailed data on this site.

The translator extracts metadata from HTML elements rather than using an API. The data in HTML is well-formatted and easy to read.

Please be aware that the pro version requires an account to access. I was given a private account to create this translator.

zoe-translates · 2023-03-28T05:39:37Z

Soopat.js

+		if (selectedItems) {
+			var urls = Object.keys(selectedItems);
+			Z.debug(urls[0]);
+			await Promise.all(


This will initiate the requests asynchronously, which in practice means almost simultaneously. Some sites don't like this (because of the botty behaviour). Is this perhaps related to the Captcha issue?

You are right, frequent requests will trigger Captcha for free users. It's not recommended to scrape many items from the search page. How about get page on by one in a for loop.

if (selectedItems) { var urls = Object.keys(selectedItems); for (let url in selectedItems) { let html = await requestDocument(url); await scrape(html, url, loginStatus); } }

Yeah, we almost always want to do await instead of Promise.all() for this reason. (And if that's not enough, we can consider artificial delays.)

The Soopat seems to have a strict policy against web scraping for free users. Sometimes I have to enter the Captcha code just after I refresh the webpage. I think there is little thing we can do on Zotero's side.

add(Soopat.js): New translator for a patent site www.soopat.com

c5fbfba

zoe-translates reviewed Mar 28, 2023

View reviewed changes

l0o0 closed this by deleting the head repository Nov 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add(Soopat.js): New translator for a patent site www.soopat.com #3007

add(Soopat.js): New translator for a patent site www.soopat.com #3007

l0o0 commented Mar 28, 2023

zoe-translates Mar 28, 2023

l0o0 Mar 28, 2023

dstillman Mar 28, 2023

l0o0 Mar 28, 2023

add(Soopat.js): New translator for a patent site www.soopat.com #3007

add(Soopat.js): New translator for a patent site www.soopat.com #3007

Conversation

l0o0 commented Mar 28, 2023

zoe-translates Mar 28, 2023

Choose a reason for hiding this comment

l0o0 Mar 28, 2023

Choose a reason for hiding this comment

dstillman Mar 28, 2023

Choose a reason for hiding this comment

l0o0 Mar 28, 2023

Choose a reason for hiding this comment