-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable Web Archivist to override robots.txt setting from W3ACT #503
Comments
w3act already specifically allows Archivists to change the policy setting. Apparently the issue is that the flag is being ignored at a later part of the process outside w3act. I've asked @anjackson to clarify requirements. |
Yes, currently neither the production crawl launcher (called w3start.py) or the new one (launch.py) actually modify the crawl to ignore robots.txt. We have the script fragments we need for this, but we still need to finish this up. The basic processes are in https://github.com/ukwa/python-shepherd/blob/master/python-w3act/w3act/job.py but I'd rather move to using the h3cc script approach currently under development here: https://github.com/ukwa/python-shepherd/blob/master/agents/h3cc.py (I've added an issue there, but we can leave this issue open until the end-to-end is proven) |
Please contact us to work around this issue manually for now. Sorry! |
Not actually a W3ACT change - this is down to deploying the new crawl engine: https://github.com/ukwa/ukwa-heritrix |
Just checking the code, and the current W3ACT does allow 'expert users' to change this flag. This will be changed to restrict it to Archivists and SysAdmins. |
Also, note the the crawl engine should be obeying the |
Could the override functionality be enabled for Archvists please so that we can change the robots.txt policy at Target level. The placeholder for this is already in the Target record: Target > Crawl policy and Schedule > Ignore Robots.txt. To be restricted to the Archivist role and higher please.
The text was updated successfully, but these errors were encountered: