Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Causing SEO confusion by setting canonical to true #866

Closed
5 of 11 tasks
siaimes opened this issue May 5, 2019 · 14 comments · Fixed by #1143
Closed
5 of 11 tasks

Causing SEO confusion by setting canonical to true #866

siaimes opened this issue May 5, 2019 · 14 comments · Fixed by #1143

Comments

@siaimes
Copy link

siaimes commented May 5, 2019

I agree and want to create new issue


My hexo config permalink is :year/:month/:day/:title.html. When I set canonical to ture, I generate my webpages with canonical tag with no html suffix. However, all of my web pages appear with html suffixes, which causes my actual web page and Google search to not get the same page and Google from Webmaster Tools can see that the traffic of the two pages are not merged.

In addition, my URL has been in use for a long time and cannot be modified. NexT seems to have just been implemented for canonical, so I hope that you can be compatible with this situation.


Expected behavior

<link rel="canonical" href="https://blog.xxxx.me/year/month/day/title.html">

Actual behavior

<link rel="canonical" href="https://blog.xxxx.me/year/month/day/title">

Steps to reproduce the behavior

  1. hexo config file: permalink: :year/:month/:day/:title.html
  2. next config file: canonical: true
  3. Generate webpage
  4. Retrieve canonical tags from web pages
  • Link to demo site with this bug: N/A
  • Link(s) to source code or any usefull link(s): N/A

Node.js and NPM Information

$ node -v && npm -v
v8.12.0
6.4.1

Package dependencies Information

$ cat package.json
{
  "name": "hexo-site",
  "version": "0.0.0",
  "private": true,
  "hexo": {
    "version": "3.8.0"
  },
  "dependencies": {
    "braces": "^3.0.1",
    "eslint": "^5.16.0",
    "hexo": "^3.8.0",
    "hexo-deployer-git": "^1.0.0",
    "hexo-deployer-rsync": "^0.1.3",
    "hexo-fs": "^1.0.2",
    "hexo-generator-archive": "^0.1.4",
    "hexo-generator-baidu-sitemap": "^0.1.6",
    "hexo-generator-category": "^0.1.3",
    "hexo-generator-index": "^0.2.0",
    "hexo-generator-search": "^2.2.5",
    "hexo-generator-sitemap": "^1.2.0",
    "hexo-generator-tag": "^0.2.0",
    "hexo-recommended-posts": "^1.0.3",
    "hexo-renderer-ejs": "^0.3.0",
    "hexo-renderer-pandoc": "^0.2.5",
    "hexo-renderer-stylus": "^0.3.1",
    "hexo-server": "^0.2.2"
  }
}

Hexo Information

Hexo version

$ hexo -v
hexo: 3.8.0
hexo-cli: 1.1.0
os: Windows_NT 10.0.17763 win32 x64
http_parser: 2.8.0
node: 8.12.0
v8: 6.2.414.66
uv: 1.19.2
zlib: 1.2.11
ares: 1.10.1-DEV
modules: 57
nghttp2: 1.32.0
napi: 3
openssl: 1.0.2p
icu: 60.1
unicode: 10.0
cldr: 32.0
tz: 2017c

Hexo Configuration

# Hexo Configuration
## Docs: https://hexo.io/docs/configuration.html
## Source: https://github.com/hexojs/hexo/

# Site
title: xxxx's blog
subtitle: 
description: 
author: xxxx
language: zh-CN
timezone:

# URL
## If your site is put in a subdirectory, set url as 'http://yoursite.com/child' and root as '/child/'
url: https://blog.xxxx.me
root: /
permalink: :year/:month/:day/:title.html
permalink_defaults:

# Directory
source_dir: source
public_dir: public
tag_dir: tags
archive_dir: archives
category_dir: categories
code_dir: downloads/code
i18n_dir: :lang
skip_render: 

# Writing
new_post_name: :year/:month/:day/:title.md # File name of new posts
default_layout: post
titlecase: true # Transform title into titlecase
external_link: true # Open external links in new tab
filename_case: 0
render_drafts: false
post_asset_folder: true
relative_link: false
future: true
highlight:
  enable: true
  line_number: true
  auto_detect: false
  tab_replace:
  
# Home page setting
# path: Root path for your blogs index page. (default = '')
# per_page: Posts displayed per page. (0 = disable pagination)
# order_by: Posts order. (Order by date descending by default)
index_generator:
  path: ''
  per_page: 10
  order_by: -date
  
# Category & Tag
default_category: uncategorized
category_map:
tag_map:

# Date / Time format
## Hexo uses Moment.js to parse and display date
## You can customize the date format as defined in
## http://momentjs.com/docs/#/displaying/format/
date_format: YYYY-MM-DD
time_format: HH:mm:ss

# Pagination
## Set per_page to 0 to disable pagination
per_page: 10
pagination_dir: page

# Extensions
## Plugins: https://hexo.io/plugins/
## Themes: https://hexo.io/themes/
theme: hexo-theme-next

NexT Information

NexT Version:

  • Latest Master branch
  • Latest Release version
  • Old version

NexT Scheme:

  • All schemes
  • Muse
  • Mist
  • Pisces
  • Gemini

NexT Configuration:

# ---------------------------------------------------------------
# SEO Settings
# ---------------------------------------------------------------

# Disable Baidu transformation on mobile devices.
disable_baidu_transformation: false

# Set a canonical link tag in your hexo, you could use it for your SEO of blog.
# See: https://support.google.com/webmasters/answer/139066
# Tips: Before you open this tag, remember set up your URL in hexo _config.yml (e.g. url: http://yourdomain.com)
canonical: true

# Change headers hierarchy on site-subtitle (will be main site description) and on all post/pages titles for better SEO-optimization.
seo: true

# If true, will add site-subtitle to index page, added in main hexo config.
# subtitle: Subtitle
index_with_subtitle: false

# Automatically add external URL with BASE64 encrypt & decrypt.
exturl: false

# Google Webmaster tools verification.
# See: https://www.google.com/webmasters
#google_site_verification:

# Bing Webmaster tools verification.
# See: https://www.bing.com/webmaster
#bing_site_verification:

# Yandex Webmaster tools verification.
# See: https://webmaster.yandex.ru
#yandex_site_verification:

# Baidu Webmaster tools verification.
# See: https://ziyuan.baidu.com/site
#baidu_site_verification:

# Enable baidu push so that the blog will push the url to baidu automatically which is very helpful for SEO.
baidu_push: true

Other Information

@siaimes siaimes added the Bug label May 5, 2019
@1v9

This comment has been minimized.

@siaimes
Copy link
Author

siaimes commented May 9, 2019

@1v9 Thank you for your help, you solved some of my problems. According to the scheme you gave, the cannonical tags in the post in the blog will no longer be confused, but the cannonical in the tags, categories, archives, etc. is still confused.

@ivan-nginx Can you update the source code of NexT to adapt it to my situation?

@siaimes
Copy link
Author

siaimes commented May 9, 2019

@1v9 @ivan-nginx

I know why this part of the page is still confusing now. Because the sitemap.xml generated by the hexo-generator-sitemap plugin, links to tags, categories, archives, etc. contain the "index.html" suffix, and NexT does not use the "index.html" suffix when referring to these pages, so The cannonical tag generated by NexT is also not suffixed with index.html. This resulted in the URL submitted to Google containing "index.html" and the canonical without "index.html".

So, I think this is a compatibility issue between NexT and hexo-generator-sitemap plugin. I hope you can solve it, thank you!

@1v9
Copy link
Member

1v9 commented May 9, 2019

@siaimes You are right, I've tried sitemap plugin and repeated it. Maybe you can solve this issue by its template option. Let's wait for somebody @stevenjoezhang 👏 saving us. You know open source is open source but people are people, seems at this period we are all busy 😘.

@siaimes
Copy link
Author

siaimes commented May 10, 2019

@1v9 Thank you for your reply. It may be the easiest way to have the "index.html" suffix for the categories, archives, etc. generated by the NexT theme. I look forward to your update of the NexT theme.

@ivan-nginx
Copy link
Member

Guys, can u clearly to explain what u got and what u need? In Actual behavior and Expected behavior style. Thank's.

@siaimes
Copy link
Author

siaimes commented May 15, 2019

@ivan-nginx The most primitive problem is that the links to the pages generated by NexT are inconsistent with the canonical tags in the page. Google is not sure which page is the canonical page that the user has affirmed.

I modified the source code according to the @1v9 's comments, and the problem seems to be solved.

You can see this web sitehttps://blog.siaimes.me/. The page's about URL is https://blog.siaimes.me/about/, and the canonical tag in the head is <link rel="canonical" href="https://blog.siaimes.me/about/">.

This is the result of fixing bugs using @1v9 's comments.

If you just look at these, it seems that there is no problem.

However, when you look into the sitemap (https://blog.siaimes.me/sitemap.xml) generated by hexo-generator-sitemap, you can find that the link for about page is

<url>
<loc>https://blog.siaimes.me/about/index.html</loc>
<lastmod>2018-01-13T10:44:31.000Z</lastmod>
</url>

This is not the same as the canonical page that is declared by NexT. This still causes SEO disorders.

Siaimes's blog
Siaimes's blog
有疑问欢迎联系作者:E-mail

@ivan-nginx
Copy link
Member

ivan-nginx commented May 20, 2019

@siaimes still not understand your problem and seems it's not a bug actually. All pages with some time will reindexed with new canonical pathes. Need some time for this, of course.

Let's in order. You want:

  • https://blog.siaimes.me/about.html instead of https://blog.siaimes.me/about but both work
  • Fix sitemap.xml, but this is another problem related to plugin

All that canonical links will be appeared and indexed in Google Search. Let's see some examples:

Without any *.html prefixes (new style)

image

Last page

This page use named html file:
https://github.com/theme-next/theme-next.org/blob/source/source/docs/theme-settings/seo.md

seo.md will be converted to seo.html and all queries → /docs/theme-settings/seo.html

image

image

image

Relative page

This page use named directory and index.html file:
https://github.com/theme-next/theme-next.org/blob/source/source/docs/theme-settings/index.md

index.md will be converted to index.html and all queries → /docs/theme-settings/index.html

image

image

image

With *.html prefixes (old style)

I u use last file type which named about.md/about/index.html/about/

image

image

image

image

Solution?

So, if I right understand u talking just about backward compatibility for indexed links with old seo-style?

  1. Renaming all last file types on relative: /about.md/about/index.md (or vice versa)
  2. Adding option suggested by @1v9, something like:
canonical:
  enable: true
  pretty: true
  #pretty: false

How u think, guys, which variant will be better?

GitHub
The website for NexT theme. Contribute to theme-next/theme-next.org development by creating an account on GitHub.
GitHub
The website for NexT theme. Contribute to theme-next/theme-next.org development by creating an account on GitHub.

@siaimes
Copy link
Author

siaimes commented May 24, 2019

@ivan-nginx It has been modified according to the recommendation of 1v9, and the file name is as you said.

The point I know is that although the link generated by NexT is https://blog.siaimes.me/about/, there is indeed a index.html file in the folder /about/, so there is no problem with the sitemap. I now change the menu part of the configuration file to something like this.

menu:
  home: / || home
  archives: /archives/index.html || archive
  categories: /categories/index.html || th
  tags: /tags/index.html || tags
  about: /about/index.html || user
  #schedule: /schedule/index.html || calendar
  #sitemap: /sitemap.xml || sitemap
  #commonweal: /404/index.html || heartbeat

The link generated by NexT now is https://blog.siaimes.me/about/index.html, but canonical still has no index.html suffix.

@stevenjoezhang
Copy link
Contributor

stevenjoezhang commented Aug 18, 2019

@stevenjoezhang
Copy link
Contributor

@ivan-nginx
Copy link
Member

Yeah! Will be good if Hexo merge it.

@stevenjoezhang
Copy link
Contributor

stevenjoezhang commented Sep 27, 2019

Solution of this issue: use pretty_urls option in Hexo _config.yml

See also #1175

@stevenjoezhang
Copy link
Contributor

Fixed in #1143

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants