Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent blockquote use in legacy /posts "body" field #36

Closed
nostalgebraist opened this issue Nov 2, 2020 · 6 comments
Closed

Inconsistent blockquote use in legacy /posts "body" field #36

nostalgebraist opened this issue Nov 2, 2020 · 6 comments

Comments

@nostalgebraist
Copy link

I am using the tumblr API, specifically the /posts endpoint with npf=false. I am using pytumblr.

One of the fields in the response is called body and contains HTML. Reblog chains are represented with nested blockquotes, like the way the dashboard used to look a long time ago.

I sometimes encounter posts where the nested blockquotes in body don't correctly convey the structure of the reblog chain.

I am wondering

  • Why does this happen?
  • Is there a "correct" way to parse legacy format that would handle cases like this?
    • For example, perhaps I should be ignoring body and treating trail (together with answer etc. when applicable) as the source of truth.
  • If there is a "correct" way to parse legacy format, could we get it documented in the API docs?
    • For example, if trail should be preferred over body, it would be nice to see this called out.
      • ... ideally with additional documentation on how to correctly parse entries in trail.

If this is simply an inherent problem with legacy format, and the recommended mitigation is to use NPF for post consumption, then it would be nice to add this recommendation to the docs.


An example of the behavior is

https://ofthefog.tumblr.com/post/625973088048300032/i-eat-things-by-breaking-off-small-chunks-and

which is a three-post reblog chain, with a post by ofthefog, then a response by facelessoldgargoyle, and then a response by ofthefog.

If I request this post with a pytumblr client client, by calling

post = client.posts('ofthefog', id=625973088048300032)['posts'][0]

then post['body'] contains the following:

<p><a class="tumblr_blog" href="https://facelessoldgargoyle.tumblr.com/post/625972741512839169">facelessoldgargoyle</a>:</p><blockquote><p>Does this mean you swallowed it whole, or in reasonably large chunks? Or will the future have to reconstruct the skull from tiny tiny fragments?</p><p><a class="tumblr_blog" href="https://ofthefog.tumblr.com/post/625923656553480192">ofthefog</a>:</p><blockquote><p>I ate half of a ram skull today which means that if my body was the crust I would have enough to make at least one reasonably complete fossil. </p></blockquote><p>Also, why? Was it good? Why only half?</p></blockquote><p>I eat things by breaking off small chunks and chewing. I became full, so I stopped. </p><p>I am given to understand this is normal. </p><p>I just like bones. They usually have tasty insides, but the outside makes me have very strong bones. Same with eggshells and horns. </p><p>My body is not the crust so it cannot be reconstructed with human hands. </p><figure class="tmblr-full" data-orig-height="1280" data-orig-width="720"><img src="https://64.media.tumblr.com/04d3ab3e179b1b572df771c91f390756/00d649404253a31c-a0/s640x960/8e6d043e264a667f716018976e813f2337e06931.jpg" data-orig-height="1280" data-orig-width="720"/></figure>

This splits facelessoldgargoyle's post in two, with ofthefog's first post appearing between one paragraph and another one. I'll have github render the same HTML below, to make this visible:


facelessoldgargoyle:

Does this mean you swallowed it whole, or in reasonably large chunks? Or will the future have to reconstruct the skull from tiny tiny fragments?

ofthefog:

I ate half of a ram skull today which means that if my body was the crust I would have enough to make at least one reasonably complete fossil.

Also, why? Was it good? Why only half?

I eat things by breaking off small chunks and chewing. I became full, so I stopped.

I am given to understand this is normal.

I just like bones. They usually have tasty insides, but the outside makes me have very strong bones. Same with eggshells and horns.

My body is not the crust so it cannot be reconstructed with human hands.

@AprilSylph
Copy link
Member

this odd behaviour can also be observed on themes that don't use {block:Reblogs} - such a post i've bookmarked is https://nightpool.tumblr.com/post/189125412740

this isn't exactly helpful, but i wanted to add that this is an observable issue even without directly interacting with the API! it's a very strange problem considering it happens so inconsistently.

@nightpool
Copy link

I filed this as a bug with Tumblr support about a year ago, and IIRC it's been happening pretty consistently since September 2019. There's something about the code that turns new NPF reblogs into the legacy HTML format that's just entirely broken. Posts made on desktop seem to be immune—afaict, this only happens with reblogs originated in NPF, like the ones produced by the mobile app.

In case a Tumblr engineer wants to take a look, the ticket number I have is 7593494 on Zendesk.

@cyle
Copy link
Member

cyle commented Nov 3, 2020

Hello! Great questions.

A lot of the answers here are loaded behind this note in the API docs:

Important note: Post content can be in two formats: legacy or Neue Post Format (NPF). By default, posts returned from this endpoint (and any other endpoint that returns posts) will be in the legacy post-type-based content formats described here. NPF-created posts from the official Tumblr mobile apps will be returned as text/regular posts to maintain backwards compatibility. To help transition to an NPF-only world, you can pass along the npf=true query parameter to force all posts returned here to be in Neue Post Format (also described here).

Specifically, these days, this part is doing a lot of work: "maintain backwards compatibility" (when npf=false)

I sometimes encounter posts where the nested blockquotes in body don't correctly convey the structure of the reblog chain. ... Why does this happen?

Simply put, posts created via the new format don't use the legacy blockquote-based HTML storage approach. We do a best effort to maintain backwards compatibility, so that you can still use npf=false, but the real solution here is for every API consumer to move to using npf=true and leverage the NPF JSON for post content. That's the "correct" approach, though I'd define it instead as the "safest" approach. Eventually, npf=false will no longer be an option, or it will be completely deprecated and we won't be fixing any bugs that stem from its usage.

If this is simply an inherent problem with legacy format, and the recommended mitigation is to use NPF for post consumption, then it would be nice to add this recommendation to the docs.

Totally! We'll work to make this clearer. 👌

I filed this as a bug with Tumblr support about a year ago, and IIRC it's been happening pretty consistently since September 2019. There's something about the code that turns new NPF reblogs into the legacy HTML format that's just entirely broken. Posts made on desktop seem to be immune—afaict, this only happens with reblogs originated in NPF, like the ones produced by the mobile app.

Yeah I remember this being filed and I thought we had fixed it, but since it seems like we haven't, I'll see if I can reopen the issue.

However, long story short here is that the conversion of NPF posts to HTML is "best effort", and there are some weird edge cases. We'll keep trying to do a best effort conversion on our side, but all API consumers should move to leveraging NPF JSON if they can.

@nightpool
Copy link

nightpool commented Nov 3, 2020

@cyle I appreciate that API consumers should migrate where possible, but it seems impractical to ask every pre-2019 theme author to migrate, and this problem is only going to become more prevalent as the new web dashboard moves to making NPF native posts and reblogs. Understandably, not everything that is possible with NPF is going to be feasible to maintain backwards compatibility for, but the reblog trail seem like a pretty fundamental feature 😅

Yeah I remember this being filed and I thought we had fixed it, but since it seems like we haven't, I'll see if I can reopen the issue.

I believe the fix implemented at the time only worked for 1-paragraph reblogs, multi paragraph reblogs still have the same problem.

@cyle
Copy link
Member

cyle commented Nov 3, 2020

I appreciate that API consumers should migrate where possible, but it seems impractical to ask every pre-2019 theme author to migrate, and this problem is only going to become more prevalent as the new web dashboard moves to making NPF native posts and reblogs.

Agreed, which is why my comments are limited to API consumers, and not themes. The blog network is a different set of expectations, and we'll maintain backwards compatibility there for the foreseeable future. That being broken is a bug that we should fix.

@cyle
Copy link
Member

cyle commented Nov 4, 2020

I've added a blurb to the API docs about this API-wise.

As for the reblog layout bug, it's on our radar, hopefully soon you'll see a note on the Changes blog when we've fixed it. I'll see if we can reopen your specific ticket @nightpool

Thanks! 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants