Substitute image links with downloaded paths #42

Open
berezovskyi opened this Issue Jun 15, 2013 · 2 comments

Comments

Projects
None yet
3 participants
@berezovskyi

It's rather good that exitwp has download_images option. It'd be great if all image links would be replaced with new ones in process.

@andrewferrier

This comment has been minimized.

Show comment
Hide comment
@andrewferrier

andrewferrier Jan 14, 2015

+1. This would be really useful.

+1. This would be really useful.

@coopermaruyama

This comment has been minimized.

Show comment
Hide comment
@coopermaruyama

coopermaruyama Feb 23, 2015

I've worked around this by doing the following, should work for you (for images you uploaded in wordpress using it's built-in uploader but you can easily change that):

1. Use the body_replace config block in config.yml to have exitwp rewrite the paths:

body_replace: {
  'http://domain.com/wp-content/uploads/[0-9]+/[0-9]+/': '/media/images/{{ page.date }}-{{ page.slug }}/'
}

2. page.date by default is formatted like 2012-07-02 00:15:24+00:00 but the path exitwp creates for images uses just the 2012-07-02 part. Therefore we need to remove everything after YYYY-MM-DD on the frontmatter of all the posts exitwp generates. Using your text editor or bash (I used Sublime Text), do a find/replace on the files exitwp generates in build/jekyll/domain.com/_posts:

  • Find: (date: [0-9]+ [0-9]+ [0-9]+)(.+)$
  • Replace with: $1

3. In your jekyll path create /media/images/ and move the images from exitwp there.

In step 1, If you could put {{ page.date | date: "%F" }} instead of {{ page.date }} that would remove the need to perform step 2, but for some reason html2text.py breaks if you have double quotes. I tried escaping with slashes but that does not work either. My python sucks so maybe someone can write this behavior into exitwp and create a PR.

I've worked around this by doing the following, should work for you (for images you uploaded in wordpress using it's built-in uploader but you can easily change that):

1. Use the body_replace config block in config.yml to have exitwp rewrite the paths:

body_replace: {
  'http://domain.com/wp-content/uploads/[0-9]+/[0-9]+/': '/media/images/{{ page.date }}-{{ page.slug }}/'
}

2. page.date by default is formatted like 2012-07-02 00:15:24+00:00 but the path exitwp creates for images uses just the 2012-07-02 part. Therefore we need to remove everything after YYYY-MM-DD on the frontmatter of all the posts exitwp generates. Using your text editor or bash (I used Sublime Text), do a find/replace on the files exitwp generates in build/jekyll/domain.com/_posts:

  • Find: (date: [0-9]+ [0-9]+ [0-9]+)(.+)$
  • Replace with: $1

3. In your jekyll path create /media/images/ and move the images from exitwp there.

In step 1, If you could put {{ page.date | date: "%F" }} instead of {{ page.date }} that would remove the need to perform step 2, but for some reason html2text.py breaks if you have double quotes. I tried escaping with slashes but that does not work either. My python sucks so maybe someone can write this behavior into exitwp and create a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment