Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add optional support for image indexing #19

Open
vezaynk opened this issue Aug 19, 2017 · 4 comments
Open

Add optional support for image indexing #19

vezaynk opened this issue Aug 19, 2017 · 4 comments

Comments

@vezaynk
Copy link
Owner

vezaynk commented Aug 19, 2017

Spec: https://support.google.com/webmasters/answer/178636

Example:

<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
    <url>
      <loc>http://example.com/sample.html</loc> 
      <image:image> 
        <image:loc>http://example.com/image.jpg</image:loc>
      </image:image> 
      <image:image>
        <image:loc>http://example.com/photo.jpg</image:loc>
      </image:image>
    </url>
</urlset> 

Google's sitemap image spec needs to be included.

Sent from my OnePlus ONE A2005 using FastHub

@vezaynk
Copy link
Owner Author

vezaynk commented Aug 19, 2017

By lack of easier implementation, an img_index function will be called from within the scan_url function after failing the header check if the option is enabled.

@vezaynk
Copy link
Owner Author

vezaynk commented Aug 26, 2017

Personal objective: Going to try to do it over the weekend

@vezaynk
Copy link
Owner Author

vezaynk commented Aug 27, 2017

I have misjudged the extent of the effort. This opens it's own can of worms.

  1. Scan hrefs and imgs
  2. Identifying images
  3. Keeping track of context

This is not something I can do in a weekend. I am tempted to mark this as out-of-scope but it looks like a fun feature to try to implement. While I will never officially support it, I might probably do it.

With that said, PRs are welcome if anybody wants to do this themselves in the meantime!

@ghost
Copy link

ghost commented Sep 7, 2017

Licenses are important for image sitemaps as there are no other feasible methods for communicating image licenses to search engines. If they are site wide, there could be a command line option for giving the license (an URL to actual license), like for example:
--license http://creativecommons.org/publicdomain/zero/1.0/

Which would output to sitemap.xml inside <image:loc>:

<image:license>http://creativecommons.org/publicdomain/zero/1.0/</image:license>

There are also some other tags in image sitemaps that could be read from the html if they are present including:

  • <image:title> - TITLE and/or ALT tag
  • <image:caption> - FIGCAPTION, TITLE and/or ALT

Video sitemaps are not very different from image sitemaps either, but here are a few more obligatory tags:

  • Title
  • Description
  • Play page URL (web page URL)
  • Thumbnail URL
  • Video file URL

These could be crawled from the html, or if not present populated with placeholders.

After the image crawling is working, I am happy to offer the project an online environment I have already coded where people can generate image sitemaps.

I will take a look at your code later and see if I can throw in something more tangible than just ideas and testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant