Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concatinated words after stripping html tags #13

Closed
sandrohoerler opened this issue Mar 27, 2018 · 2 comments
Closed

Concatinated words after stripping html tags #13

sandrohoerler opened this issue Mar 27, 2018 · 2 comments

Comments

@sandrohoerler
Copy link

Example: https://cross-works.net/uber-uns/

Helmut SprollManaging Director
or
Sivan FadelliBuchhaltung & Administration

cross-html.txt
inscriptis_output.txt

@AlbertWeichselbraun
Copy link
Contributor

This layout problem is caused by an external style file (style.min.css) that sets the display: block property for the person-name class.

.fusion-person .person-desc .person-author .person-name, .fusion-person .person-desc .person-author .person-title {
    display: block;
}

Inscriptis already handles display: block correctly, but only if

  1. the css is embedded in the html, and
  2. the style attribute rather than classes are used.

Addressing the 2. issue, doesn't seem to be too much of a problem, but retrieving external css files adds another layer of complexity (especially since many web servers tend to throttle or block access from from non-browser user agents). But if this is a go/no-go criteria for multiple use cases we will implement it as well.

AlbertWeichselbraun added a commit that referenced this issue Sep 24, 2019
- add the `--indentation` command line option for inscript.py
- add the `indentation` parameter to `get_text()`
- this addresses #13 and #18.
@AlbertWeichselbraun
Copy link
Contributor

This is fixed in version 0.4.1.1+.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants