Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Records exceed 10 Kb because HTML and content in JSON are the same. #110

Open
frankmeeuwsen opened this issue Jan 4, 2019 · 3 comments
Open

Comments

@frankmeeuwsen
Copy link

I want to report a bug: While indexing my blog, the command repeatedly stops because the file to index is too big, mainly caused by the HTML part of the JSON

What is the current behavior?

When I index my site, I get errors like

The jekyll-algolia plugin detected that one of your records exceeds the 10.00 Kb
record size limit.

title:    Bloghelden, 10 jaar webloggen in Nederland
url:      /bloghelden/
size:     11.99 Kb

Most probable keys causing the issue:
   html (5.76 Kb), content (5.74 Kb), tags (0.05 Kb)

What is your expected behavior?

I would expect my relative small blogposts to be indexed. I see from the logfile the index both includes the HTML and the content part. They are almost the same besides the removal of the p-tag in the content part. See enclosed JSON for details.

jekyll-algolia-record-too-big.log

Git repository to reproduce the issue:

https://github.com/frankmeeuwsen/DTD-Blog

Ruby version used:

ruby 2.4.2p198 (2017-09-14 revision 59899)

Jekyll version used:

jekyll 3.7.3

@pixelastic
Copy link
Collaborator

Hello and thanks for the report,

The way the plugin works is by splitting each page of your blog into several records. By default, it creates one record per <p> of text in the page. Each of those records must not weight more than 10kb and the plugin tries its best to shrink the content until it fits under the 10kb threshold. Having both the plain text and the HTML version of the paragraph in the record is expected.

What seems to not work as expected on your page is that the whole content of the page is set into the record (all paragraphs). I'll clone your repo and investigate; thanks for the report.

@pixelastic
Copy link
Collaborator

@frankmeeuwsen I don't see the jekyll-algolia plugin in your Gemfile on master? Do you have a branch with you setup where I could reproduce the whole issue you're having?

@frankmeeuwsen
Copy link
Author

@pixelastic Thanks for your explanation. My apologies, I forgot to commit the branch 😊. You can find it on https://github.com/frankmeeuwsen/DTD-Blog/tree/20190104-Algolia

The strangest thing happened though. When I just checked the branch again and tried to index, all of a sudden it passes through. I will investigate some more what is happening/has happened but this is strange. In a good way 🤣.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants