Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incomplete descriptions in metadata #37

Closed
mooa-FA opened this issue Feb 4, 2020 · 1 comment
Closed

Incomplete descriptions in metadata #37

mooa-FA opened this issue Feb 4, 2020 · 1 comment

Comments

@mooa-FA
Copy link

mooa-FA commented Feb 4, 2020

Hi, thanks for this great script, I found it a few days ago and it's really useful :) This isn't really an issue, per se, but something I thought it might be nice to amend if using the separate metadata files.

I had a look at the pagesource and it looks like the descriptions were being taken from <meta property="og:description" content=", unfortunately that gives incomplete descriptions limited to ~121 characters without spaces.

I'd like to propose the following edit to copy the full description. There was also an '& quot ;' issue in some of the titles which is hopefully sorted as well. It might not be the cleanest way of doing it, but it seems to work okay. It copies it verbatim, so if they've put a lot of linebreaks in the desc, they'll appear in the metadata file. I'm interested to hear what you think :)

From line 147 'Get metadata'

            description="$(cat "$tempfile" | tr '\n' ' ' | sed 's/\(<div class="submission-description">\)/\n\1/gI' | sed 's/\(<\/div>\)/\1\n/gI' | grep -o '<div class="submission-description".*</div>' | sed 's/<div class="submission-description">                     //g' | sed 's@<br />@\n@g' | sed 's/<a href="//g' | sed 's@" class=".*</a>@@g' | sed 's@                </div>@@g' | sed 's/&quot;/"/g')"
            if [ $classic = true ]; then
                    title="$(grep -Eo '<h2>.*</h2>' "$tempfile" | awk -F "<h2>" '{print $2}' | awk -F "</h2>" '{print $1}' | sed 's/&quot;/"/g')"
            else
                    title="$(grep -Eo '<h2><p>.*</p></h2>' "$tempfile" | awk -F "<p>" '{print $2}' | awk -F "</p>" '{print $1}' | sed 's/&quot;/"/g')"
            fi
@mooa-FA
Copy link
Author

mooa-FA commented Feb 5, 2020

Minor update to clean up a few more things in the description from line 148:
description="$(cat "$tempfile" | tr '\n' ' ' | sed 's/\(<div class="submission-description">\)/\n\1/gI' | sed 's/\(<\/div>\)/\1\n/gI' | grep -o '<div class="submission-description".*</div>' | sed 's/<div class="submission-description"> //g' | sed 's@<br />@\n@g' | sed 's@" title=.*</a>@@g' | sed 's/<a href="//g' | sed 's@" class=".*</a>@@g' | sed 's@ </div>@@g' | sed 's/&quot;/"/g' | sed 's/<a class="auto_link named_url" href="//g' | sed 's@</a>@@g' | sed 's/&#46;/./g' | sed 's@<i class=".*</i>@@g' | sed 's/&gt;/>/g' | sed 's/&lt;/</g' | sed 's/" class="iconusername.*align="middle//g' | sed 's/&amp;/\&/g' | sed "s/&apos;/'/g" | sed 's/&pound;/£/g' | sed 's/&yen;/¥/g' | sed 's/&euro;/€/g' | sed 's/<span.*span>//g' | sed 's/<strong class="bbcode bbcode_b">//g' | sed 's/<a class="auto_link named_url" href="//g' | sed 's/">.*strong>//g' | sed 's/">/ /g' | sed 's/\r//g' | sed 's/^ //g')"

if [ $classic = true ]; then
title="$(grep -Eo '<h2>.*</h2>' "$tempfile" | awk -F "<h2>" '{print $2}' | awk -F "</h2>" '{print $1}' | sed 's/&quot;/"/g' | sed 's/&amp;/\&/g' | sed 's@/@_@g')"
else
title="$(grep -Eo '<h2><p>.*</p></h2>' "$tempfile" | awk -F "<p>" '{print $2}' | awk -F "</p>" '{print $1}' | sed 's/&quot;/"/g' | sed 's/&amp;/\&/g' | sed 's@/@_@g')"

@Xerbo Xerbo closed this as completed Apr 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants