Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: introduce measures to avoid bots crawling and indexing activities #5728

Merged
merged 8 commits into from
May 5, 2023

Conversation

doncicuto
Copy link
Contributor

Add robots.txt, robots meta tag and X-Robots-Tag to avoid bots crawling and indexing activities for a Zitadel instance

Acceptance Criteria

  • We don't want bots to crawl Instance URLs, using robots.txt
  • We don't want bots that visit links to the Console or Login pages to index content from it, using meta tag robots set to none -> none equals noindex, nofollow
  • Also we don't want REST endpoints to be indexed, using "X-Robots-Tag: none"

Here are some screenshots:

HTML source code
meta_robots_none_2
meta_robots_none

API Response Headers
x_robots_tag

Robots.txt found
robots_txt

Tasks

@vercel
Copy link

vercel bot commented Apr 22, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback May 5, 2023 7:23am

@doncicuto doncicuto changed the title Introduce measures to avoid bots crawling and indexing activities fix: introduce measures to avoid bots crawling and indexing activities Apr 22, 2023
@codecov
Copy link

codecov bot commented Apr 22, 2023

Codecov Report

Merging #5728 (59f2f76) into main (11f0f54) will increase coverage by 0.00%.
The diff coverage is 65.51%.

@@           Coverage Diff           @@
##             main    #5728   +/-   ##
=======================================
  Coverage   44.40%   44.41%           
=======================================
  Files        1173     1175    +2     
  Lines      103086   103114   +28     
=======================================
+ Hits        45778    45799   +21     
- Misses      55159    55166    +7     
  Partials     2149     2149           
Impacted Files Coverage Δ
internal/api/http/header.go 12.00% <ø> (ø)
internal/api/grpc/server/gateway.go 52.32% <12.50%> (-4.09%) ⬇️
cmd/start/start.go 58.02% <50.00%> (-0.03%) ⬇️
internal/api/api.go 65.71% <100.00%> (+0.24%) ⬆️
...nal/api/grpc/server/middleware/auth_interceptor.go 100.00% <100.00%> (ø)
...rnal/api/http/middleware/robots_tag_interceptor.go 100.00% <100.00%> (ø)
internal/api/robots_txt/robots_txt.go 100.00% <100.00%> (ø)

... and 2 files with indirect coverage changes

@hifabienne
Copy link
Member

Thanks @doncicuto
We are currently finishing up the current sprint and will then have a look at it.

@peintnermax
Copy link
Member

IMHO we can disallow robots on our login page, but disabling it in our console would also make our other meta tags unnecessary which are used to render a preview in chats / social media or doesn't it? 🤔

@hifabienne
Copy link
Member

IMHO we can disallow robots on our login page, but disabling it in our console would also make our other meta tags unnecessary which are used to render a preview in chats / social media or doesn't it? 🤔

Who do you ask this? 😃

@peintnermax
Copy link
Member

IMHO we can disallow robots on our login page, but disabling it in our console would also make our other meta tags unnecessary which are used to render a preview in chats / social media or doesn't it? 🤔

Who do you ask this? 😃

All who can answer it 😀 IMO the robot metatag prevents the other metatags from being read and a preview would no longer be generated
Bildschirm­foto 2023-04-25 um 15 24 16

@peintnermax
Copy link
Member

The PR and console changes look good to me. Maybe @adlerhurst or @livio-a can review the golang code

@livio-a
Copy link
Member

livio-a commented Apr 26, 2023

The PR and console changes look good to me. Maybe @adlerhurst or @livio-a can review the golang code

i'll try check in the afternoon or at latest tomorrow

@livio-a livio-a self-requested a review May 2, 2023 09:30
@livio-a livio-a self-assigned this May 2, 2023
Copy link
Member

@adlerhurst adlerhurst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@doncicuto
Copy link
Contributor Author

doncicuto commented May 2, 2023

Hi @adlerhurst, I've added the x-robots-tag to grpcwebserver as requested, thanks for the review

grpcweb_x_robots_tag

adlerhurst
adlerhurst previously approved these changes May 5, 2023
@adlerhurst
Copy link
Member

hi @doncicuto

Thanks for your contribution. 🙏🏻
Looks good to me.

@adlerhurst adlerhurst merged commit 3ca7147 into zitadel:main May 5, 2023
10 checks passed
@github-actions
Copy link

🎉 This PR is included in version 2.24.0-ignore-me2.1 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants