Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instance-wide search engine opt-out #11750

Closed
brortao opened this issue Sep 3, 2019 · 6 comments · Fixed by #11804

Comments

@brortao
Copy link
Contributor

commented Sep 3, 2019

Pitch

It’d be useful if instance admins could flip a toggle to opt the entire instance out of search engine indexing through robots.txt.

Motivation

It’s already for individual users to do this through their settings. Often, an instance admin can change their nginx config to use a custom robots.txt, but many don’t have the technical expertise or don’t have shell access (e.g. of running their instance through a third-party provider).

Now that we’re seeing fediverse search engines (https://search.social/), some instances may want to opt out. There’s currently no way for an entire instance hosted on e.g. masto.host to opt out — instead, every single user has to do so.

@hrefhref

This comment has been minimized.

Copy link

commented Sep 3, 2019

Actually, I think the best way to do this is to allow admins to set a default for every profile. Right now the user level noindex defaults to false, but an instance could change it.

If an instance sets a robots.txt, indexes would be forced to ignore all of the users of the domain.

@Gargron

This comment has been minimized.

Copy link
Member

commented Sep 3, 2019

robots.txt isn't actually an opt-out of search engines. It asks bots not to visit specific pages but search engines may still include links on the site in their index. The only directive that specifically asks about exclusion from indexes is the "noindex" meta tag. Counterintuitively, if robots.txt prevents the crawling of a page, the search engine will never see the noindex tag, and may include that page in the index when it's linked from somewhere else.

That is to say, you should not spread the recommendation to change robots.txt as a solution to search engine opt-outs.

Sysadmins can edit the config/settings.yml file to change the default value of user-level noindex preference. That path is not available to masto.host users, however.

@jeroenpraat

This comment has been minimized.

Copy link
Contributor

commented Sep 3, 2019

@Gargron At the top of config/settings.yml I just read 'This file contains default values, and does not need to be edited. All important settings can be changed from the admin interface.'.

I could not find the noindex option in the admin interface. Maybe an idea to add it?

@brortao

This comment has been minimized.

Copy link
Contributor Author

commented Sep 3, 2019

@Gargron i didn't know that, thank you for clarifying! in that case, i suppose this feature suggestion is just adding an admin interface toggle for the default user-level noindex value.

if you think it's a good idea, i'm happy to open a PR -- it seems like a gentle way to start working with the mastodon codebase :)

@Gargron

This comment has been minimized.

Copy link
Member

commented Sep 3, 2019

I could not find the noindex option in the admin interface. Maybe an idea to add it?

Yes, indeed. Because the "site settings" page is getting too big, perhaps a different page for user-preference defaults should be added.

@bremensaki

This comment has been minimized.

Copy link

commented Sep 5, 2019

A side-note that may warrant a new topic - if we're talking flexibly adding "noindex" tags to user profiles, all profiles should probably all have one on by default, until say they meet the minimum requirements for profile directory inclusion. That's a minimum follower count and age restriction, isn't it? Only after that point the preference should come into play.

This would remove incentive for those bots that create profiles, add some links for whatever SEO junk they're doing, and never come back again. As they never post they fly below the radar quite often and I don't pick them up until they start appearing in search result referrals for weird terms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.