Disallow forms (with CAPTCHA) to bots #3936

chenejac · 2024-01-18T09:04:50Z

What does this pull request do?

Disallow access to /contact and /forgot-password to bots (at least to bots which respect robots.txt)

What's new?

robots.txt is updated

How should this be tested?

Run VIVO and try to access to /contact and /forgotPassword from the web browser (this should work), and then testing robots.txt file by using some validator such as this one. Please note that you run VIVO at some public address as a root application (meaning it should not be http://somedomain.com/vivo, it should be http://somedomain.com)

Interested parties

Tag (@ mention) interested parties or, if unsure, @VIVO-project/vivo-committers

wwelling

https://github.com/search?q=repo%3Avivo-project%2FVitro%20%2Fcontact&type=code

https://github.com/search?q=repo%3Avivo-project%2FVitro+%2FsubmitFeedback+&type=code

https://github.com/vivo-project/Vitro/pull/421/files#diff-a461e0c1d25d9f1433d95156cd7e1b2a79f90d138b2d46c95889567c2fb742d7R39

ivanmrsulja

Works as intended. But keep in mind that the instructions in robots.txt files cannot enforce crawler behavior to the site, just suggest it 😄 to stop crawlers from accessing the pages we can try to manually whitelist user agents on the backend but I think it is not needed.

milospp · 2024-01-23T09:40:17Z

Works as intended. But keep in mind that the instructions in robots.txt files cannot enforce crawler behavior to the site, just suggest it 😄 to stop crawlers from accessing the pages we can try to manually whitelist user agents on the backend but I think it is not needed.

I don't think we should manually detect robots and disable those pages, because we cannot be 100% sure who is robot and who is not just by header. That's why we have captcha on those pages.

ivanmrsulja

One change requested as in: vivo-project/Vitro#438 (review)

webapp/src/main/webapp/robots.txt

Co-authored-by: Ivan R. Mršulja <nighteliteace@gmail.com>

ivanmrsulja

I have re-run all the tests using Merkle and now everything works as intended. Steps to reproduce the tests:

Setup a publicly available VIVO server (I reccommend using a tool like NGROK)
If the vivo does not run on the root URL and instead you have to go to /vivo or something similar, you have to provide robots.txt manually in the text editor
Choose the crawler of your choice from dropdown menu and try to fetch any of the disallowed domains

balmas

This works as advertised. Tested with Merkle and confirmed that submitFeedback, contact and forgotPassword are all disallowed.

Disallow forms (with CAPTCHA) to bots

1350071

chenejac requested review from ivanmrsulja and milospp January 18, 2024 09:04

chenejac mentioned this pull request Jan 18, 2024

Disallow forms (with CAPTCHA) to bots vivo-project/Vitro#438

Merged

chenejac linked an issue Jan 18, 2024 that may be closed by this pull request

Update of robots.txt #3935

Closed

wwelling previously approved these changes Jan 19, 2024

View reviewed changes

ivanmrsulja approved these changes Jan 22, 2024

View reviewed changes

milospp approved these changes Jan 23, 2024

View reviewed changes

ivanmrsulja suggested changes May 31, 2024

View reviewed changes

webapp/src/main/webapp/robots.txt Outdated Show resolved Hide resolved

Fix forgot password URL (camel notation)

d5b8911

Co-authored-by: Ivan R. Mršulja <nighteliteace@gmail.com>

chenejac dismissed wwelling’s stale review via d5b8911 June 3, 2024 07:04

ivanmrsulja approved these changes Jun 3, 2024

View reviewed changes

balmas approved these changes Jun 6, 2024

View reviewed changes

wwelling approved these changes Jun 13, 2024

View reviewed changes

litvinovg merged commit 0fa85e8 into vivo-project:main Jun 14, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disallow forms (with CAPTCHA) to bots #3936

Disallow forms (with CAPTCHA) to bots #3936

chenejac commented Jan 18, 2024 •

edited

Loading

wwelling left a comment

ivanmrsulja left a comment

milospp commented Jan 23, 2024

ivanmrsulja left a comment

ivanmrsulja left a comment

balmas left a comment

Disallow forms (with CAPTCHA) to bots #3936

Disallow forms (with CAPTCHA) to bots #3936

Conversation

chenejac commented Jan 18, 2024 • edited Loading

What does this pull request do?

What's new?

How should this be tested?

Interested parties

wwelling left a comment

Choose a reason for hiding this comment

ivanmrsulja left a comment

Choose a reason for hiding this comment

milospp commented Jan 23, 2024

ivanmrsulja left a comment

Choose a reason for hiding this comment

ivanmrsulja left a comment

Choose a reason for hiding this comment

balmas left a comment

Choose a reason for hiding this comment

chenejac commented Jan 18, 2024 •

edited

Loading