Skip to content
This repository has been archived by the owner on Jun 3, 2020. It is now read-only.

Handle WordPress [caption] tags #38

Open
ianrenton opened this issue Apr 9, 2013 · 2 comments
Open

Handle WordPress [caption] tags #38

ianrenton opened this issue Apr 9, 2013 · 2 comments

Comments

@ianrenton
Copy link

WordPress pages use a square-bracketed format for describing image captions in the editor markup, which is then converted to a div when WordPress renders the page.

Because exitwp/html2text sees the markup and doesn't understand it, this makes it through to the markdown output as text that wraps around the image markdown and must be manually removed.

I realise there is no standard markdown for an image caption, but could the script do something a bit nicer than just leaving the [caption] blocks where they are? (e.g. remove them entirely, put the caption in as a new paragraph, etc.) Not an ideal solution, but I don't think there is an ideal solution.

@THPubs
Copy link

THPubs commented Jun 29, 2013

This is a big problem. I can't figure out how to get rid of captions.. I can't edit manually because I have nearly 1600 posts!

@gpoo
Copy link

gpoo commented Aug 15, 2013

To remove the captions, you can add a regular expression in config.yaml. Something like:

body_replace: {
  '\[caption.*caption="(.*)"\](.*)\[\/caption\]': '\2',
}

Notice I used 2 backreferences, in case you want to use the text in the caption (i.e. \1). You can tweak the regular expression to your own needs.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants