Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hanging <p> and/or </p> #492

Closed
tbeason opened this issue May 16, 2020 · 5 comments
Closed

hanging <p> and/or </p> #492

tbeason opened this issue May 16, 2020 · 5 comments

Comments

@tbeason
Copy link

tbeason commented May 16, 2020

I see places where <p> or </p> exists without its complement. It seems not to mess up the page somehow, but it stills feels like an issue.

There even seems to potentially be a pattern.

If I include just a sentence of standard Markdown, that gets prefixed by <p> and does not get the corresponding </p>.
Here is an example that was a sentence by itself,

I am a proponent of open source software and transparent academic research. I am an active member of the [Julia](https://julialang.org/) community.

which became

<p>I am a proponent of open source software and transparent academic research. I am an active member of the <a href="https://julialang.org/">Julia</a> community.

and another, which was just a title on a Markdown page (# Research),

<p><h1 id="research"><a href="#research">Research</a></h1>

Alternatively, this

@@font-weight-bold,mb-0 The Anatomy of Trading Algorithms @@
@@mb-0 with [Sunil Wahal](https://asu.pure.elsevier.com/en/persons/sunil-wahal). [SSRN](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3497001) @@
We study the anatomy of four widely used standardized institutional trading algorithms
representing \$675 billion in demand from 961 institutions between 2012 and 2016. The central tradeoff in these algorithms is between the desire to trade and transaction costs. Large parent orders generate hundreds of child orders which strategically employ the price, time, and display priority rules embodied in market structure to navigate this tradeoff. The distribution of child orders is non-random, generating strategic runs which oscillate between providing and taking liquidity. Price impact occurs both at the time an order is submitted to the book (regardless of whether it is filled), and at the time of execution. Passive child orders have much lower likelihood of execution but still incur substantial price impact. Conversely, marketable orders, even though immediately executable, do not necessarily guarantee execution and generate even larger price impact.

turned into this

<div class="font-weight-bold mb-0">The Anatomy of Trading Algorithms</div> <div class="mb-0">with <a href="https://asu.pure.elsevier.com/en/persons/sunil-wahal">Sunil Wahal</a>. <a href="https://papers.ssrn.com/sol3/papers.cfm?abstract_id&#61;3497001">SSRN</a></div> We study the anatomy of four widely used standardized institutional trading algorithms representing &#36;675 billion in demand from 961 institutions between 2012 and 2016. The central tradeoff in these algorithms is between the desire to trade and transaction costs. Large parent orders generate hundreds of child orders which strategically employ the price, time, and display priority rules embodied in market structure to navigate this tradeoff. The distribution of child orders is non-random, generating strategic runs which oscillate between providing and taking liquidity. Price impact occurs both at the time an order is submitted to the book &#40;regardless of whether it is filled&#41;, and at the time of execution. Passive child orders have much lower likelihood of execution but still incur substantial price impact. Conversely, marketable orders, even though immediately executable, do not necessarily guarantee execution and generate even larger price impact.</p>

which has the trailing </p> at the very end.

It doesn't seem to happen within an "environment" like lists. And it doesn't always happen. But in my relatively simple page, I see it in about 10 spots.

@tlienart
Copy link
Owner

Haha yes I'm well aware of this. It's just... pretty hard.

There's a few things going on but to make things simple, basically Franklin proceeds like this:

  1. goes over the source page finds a bunch of "tokens"
  2. goes over tokens, determines "blocks" (typicall a "from" token and a "to" token)
  3. builds the page by lacing "resolved texts" and "resolved blocks"

Resolved texts (considered as "plain markdown") are passed to Julia's Markdown module where they are transformed to HTML. Resolved blocks are processed recursively until they lead to a mix of plain HTML and plain markdown which is, again, is sent to Julia's Markdown module to get plain HTML

There are two main issues resulting from this process:

  1. the Julia's Markdown module is not exactly perfect and tends to add a whole bunch of <p> everywhere, when you lace everything back together you can end up with places which should have ps and don't and places which shouldn't and do (e.g. around titles and lists in particular).
  2. the process of insertion of blocks (e.g. solving latex-like commands) can make it pretty difficult to figure out where there should actually be <p> or not.

Anyway both these issues can be addressed up to an extent: for the first one using Pandoc's markdown to html (see #459) for the latter having a process that keeps track of opening and closing things and trying to balance everything.

This is not high on my priority list because browsers are excellent at ignoring the placements of the closing </p>, this is not to say that I wouldn't like to fix it, but it's a lot of work for very little improvement for the users; while I do want to do it, there's other things I'd like to fix first :-)

I hope that makes sense!

@tlienart
Copy link
Owner

tlienart commented May 16, 2020

I should add that I'm grateful for the specific reproducing examples which I'll look into, there's probably things that can be done to mitigate those cases... if you look at them in isolation:

julia> s |> fd2html
"<p>I am a proponent of open source software and transparent academic research. I am an active member of the <a href=\"https://julialang.org/\">Julia</a> community.</p>\n"
julia> s = """# Research""" |> fd2html
"<h1 id=\"research\"><a href=\"#research\">Research</a></h1>"

One extra note: you use Franklin.fd2html in your hfun (good!), note that while you use internal=true (good!) there should be an additional keyword to trim dangling <p>s which I haven't yet implemented (see #489)

other cross refs for me: #82 , possibly revisit 747a340

@tbeason
Copy link
Author

tbeason commented May 16, 2020

Definitely agree that it is low priority because the browsers handle it fine.

It must be related to #489 . I didn't think that it was when I posted but now I see that, for example, the # Research is at the top of the file I'm inserting so it would presumably get a <p> stuck in front. It doesn't always happen like that though. When I use that {{insertmd file}} inside of a bunch of other stuff, the <p> is not stuck in front of the first headline. Presumably the parsing absorbs it somehow.

Here are the relevant links in case it helps (next version of my website eventually)
http://tbeason.com/FranklinWebby/
https://github.com/tbeason/FranklinWebby

@tlienart
Copy link
Owner

I think what we'd need is a list of short examples where if you do s |> fd2html the number of ps doesn't match. Having a bank of those would help greatly.

One could also write a checkps function (bonus point for calling it chickpeas I guess) that counts the number of opening and closing <p> </p>, that would help detecting blocks that could be part of that list of bad examples.

function chickpeas(s)
    nopen = length(collect(eachmatch(r"\<p\>", s)))
    nclose = length(collect(eachmatch(r"\<\/p\>", s)))
    return nopen-nclose == 0
end 

@tlienart tlienart added the wip label Jul 6, 2020
This was referenced Jul 6, 2020
@tlienart
Copy link
Owner

tlienart commented Jul 8, 2020

alright this should be mostly fixed by #546 @tbeason I might reach out to you separately to re-try this once I've written explicit guidelines to avoid problems but basically if you want to insert a "block" then separate it with skipped lines like

A

[something to insert a block]

B

and along with internal fixes, that should get the ps to be balanced most of the time.

@tlienart tlienart closed this as completed Jul 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants