-
-
Notifications
You must be signed in to change notification settings - Fork 856
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance issues when rendering large PDFs #94
Comments
@jesusgp22 You probably want to use some kind of virtualization library for displaying PDF's with a lot of pages, like react-virtualized for instance. Maybe this is useful to you. |
Hey, thank you so much for your answer, I'll def check this out You might want to add a note about this on react-pdf documentation to help others with the same performance issues or even in the future add this as a core feature for large docs. |
Following up on this @michaeldzjap I am watching some presentations on react-virtualized and it will break text search feature, is this a trade off that I can't get around? |
I am not familiar with the text search feature I have to admit. But I suspect that it relies on the text layer for each page to be rendered in order to be able to find all the relevant results for a specific search query (e.g. a word that could be located anywhere in the document). The whole point of virtualizing a collection of elements ( I don't think there is an easy way around this unfortunately. A solution could be to keep a virtual representation of a text layer of each page in memory (like how React does this for HTML elements) and search through that instead. Might be possible. |
That's an interesting approach, I am guessing this will most likely break browser text search feature anyway, in any case I think it is ok to implement this using just a regular search box element. Now the questions are:
|
I think you would need to dig into pdf.js for this, relying on the react-pdf api probably is not enough. You can get the text content for a page using this apparently: page.getTextContent().then(function(textContent) { ... });
Yes, that is a tricky one... You'd know the page number. Maybe it should be a 2 step operation or something. 1 - Search through the virtual text layers for a query. Keep a result of all pages that match. 2 - For each page in the result of step 1 see if it is rendered, if it is you can probably find the word rather easily, because I think each word is rendered as a separate HTML element in a text layer. If the page is not rendered yet, scroll to it with react-virtualized so that it will be rendered and then again find the HTML element that contains the first occurrence of the word/query in the text layer element tree. Something like the above. I might think too simplistic about this, I haven't actually tried it myself. But this is how I would approach things initially I think. |
I was wondering if the biggest performance issue was rendering the text layer or the canvas, in case rendering the canvas is an issue, it might be possible to ask pdf.js to only render the text layer? |
@jesusgp22 Nope, you can toggle text content and annotations on/off, but canvas is currently not behind a flag. I don't see a good reason against having it toggleable, though :) |
@michaeldzjap Any reason for this? If you use |
@wojtekmaj Yes, my wording was rather poor. What I meant is that
Yes. This is exactly what I do to cache all document page widths and height on initial load when using react-pdf together with react-virtualized. |
Thank you both for this amazing discussion 👍 |
We are also having trouble loading long PDFs. We are loading a 17mb PDFs and the application crashes, and since we have customers with 100mb+ PDFs, crashing is not an option. This example which is also a react wrapper to PDF.js seem to work for us. It tricks PDF.js to load only the current visible page and the ten previous pages. It looks like it has something to do with the wrapper div's styles, because when you change some of the styles it loses it lazy loading behaviour. I couldnt reproduce this trick to your lib. But we liked react-pdf so much that we are still trying to adapt this lazy load trick to it. We like the fact that your lib has no default toolbox and that it has mapped its props to pdf.js handlers/configs, so we can develop our customized toolbox. So we would be glad to see it working better with long pdfs, maybe using this trick that yurydelendik/pdfjs-react uses (thats a shame that I couldnt reproduce it with your lib! ) |
@MarcoNicolodi I found that react-virtualized worked really bad with react-pdf I implemented the aproach to only render a few pages but to make things work you have to render a div that has the dimensions of the pages you don't render you can 100% integrate this with react-pdf using the document object that is returned by react-pdf and use getPage and page.getViewport methods to get the page dimensions I built my own algorithm to detect what pages are visible and I run it everytime the user scrolls or if a resize event happens. |
Hey everyone, There is some good news too, though. If I can suggest something, import React, { Component } from 'react';
import { Document, Page } from 'react-pdf/build/entry.webpack';
import './Sample.less';
export default class Sample extends Component {
state = {
file: './test.pdf',
numPages: null,
pagesRendered: null,
}
onDocumentLoadSuccess = ({ numPages }) =>
this.setState({
numPages,
pagesRendered: 0,
});
onRenderSuccess = () =>
this.setState(prevState => ({
pagesRendered: prevState.pagesRendered + 1,
}));
render() {
const { file, numPages, pagesRendered } = this.state;
/**
* The amount of pages we want to render now. Always 1 more than already rendered,
* no more than total amount of pages in the document.
*/
const pagesRenderedPlusOne = Math.min(pagesRendered + 1, numPages);
return (
<div className="Example">
<header>
<h1>react-pdf sample page</h1>
</header>
<div className="Example__container">
<div className="Example__container__document">
<Document
file={file}
onLoadSuccess={this.onDocumentLoadSuccess}
>
{
Array.from(
new Array(pagesRenderedPlusOne),
(el, index) => {
const isCurrentlyRendering = pagesRenderedPlusOne === index + 1;
const isLastPage = numPages === index + 1;
const needsCallbackToRenderNextPage = isCurrentlyRendering && !isLastPage;
return (
<Page
key={`page_${index + 1}`}
onRenderSuccess={
needsCallbackToRenderNextPage ? this.onRenderSuccess : null
}
pageNumber={index + 1}
/>
);
},
)
}
</Document>
</div>
</div>
</div>
);
}
} Of course you can do much more - add placeholders, check on scroll which pages need rendering, keep info on whether all pages so far were rendered... I believe in your creativity ;) And if I can be of any help regarding API, please let me know! |
Hey, may you share this example? |
@MarcoNicolodi yes, I think it can even be included as a PR to react-pdf at some point I don't have the time to share the code right now but I will later today. |
I think the right place for that is Wiki. :) I highly encourage you to add your experiences on a new page there. |
i've try to make react-pdf work with react-virtualized, but failed. so i make a minimal demo with original pdf.js with react-virtualized, its pretty fast, weird. https://github.com/crapthings/react-pdf-viewer/blob/master/client/pdf.js |
@crapthings I think it may be something related to how react-virtualized measures rows before final rendering. It may unmount a component after initial measurement. I'm no expert in react-virtualized but perhaps you could force height or otherwise somehow disable these measurements? If something forces Page to unmount itself, it will cancel rendering. Should retry rendering on mounting again though 🤔 |
Well I want to contribute to the discussion because a project of mine depends on this awesome library and I found a way to implement lazy-loading with no extra dependencies. The only caveat is that I lost the natural scrolling behavior but the user can still scroll and change pages. The logic is that the visible component is just one page and there is another component with import React from 'react';
import { Document, Page } from 'react-pdf';
const pdfjs = require('pdfjs-dist/build/pdf.min.js');
pdfjs.PDFJS.workerSrc = '../src/assets/pdf.worker.min.js';
pdfjs.PDFJS.cMapUrl = '../src/assets/cmaps/';
pdfjs.PDFJS.cMapPacked = true;
export default class App extends React.Component {
constructor(props) {
super(props);
this.state = {
numPages: null,
pageIndex: null,
binaryPDFContent: somebase64string,
}
}
componentDidMount() {
this.PDFWidth = 400;
document.getElementById('pdfContainer').addEventListener('wheel', this.onScrollPDF.bind(this));
}
componentWillUnmount() {
document.getElementById('pdfContainer').removeEventListener('wheel', this.onScrollPDF.bind(this));
}
onDocumentLoadSuccess(nPages) {
if (this.state.pageIndex==null) {
this.setState({
numPages: nPages,
pageIndex: 0,
});
} else if (this.state.pageIndex > nPages) {
this.setState({
numPages: nPages,
pageIndex: nPages-1,
})
} else {
this.setState({
numPages: nPages,
});
}
}
onScrollPDF(event) {
let delta = null;
if (event.wheelDelta) {
delta = event.wheelDelta;
} else {
delta = -1 * event.deltaY;
}
// This is where some customization can happen
if (delta < -20) {
this.nextPage()
} else if (delta > 10) {
this.previousPage()
}
}
previousPage() {
if (this.state.pageIndex > 0) {
this.setState({
pageIndex: this.state.pageIndex-1
})
}
}
nextPage() {
if (this.state.pageIndex+1 < this.state.numPages) {
this.setState({
pageIndex: this.state.pageIndex+1
})
}
}
render() {
let PDFContainerHeight = 600;
return (
<div
id="pdfContainer"
style={{width:this.PDFWidth, height:PDFContainerHeight, overflow:'hidden'}}
>
<Document
file={{data:`data:application/pdf;base64,${this.state.binaryPDFContent}`}}
onLoadSuccess={(pdf) => {
this.onDocumentLoadSuccess(pdf.numPages)
}}
className="pdfPreview"
rotate={0}
>
<Page
key={`page_${this.state.pageIndex + 1}`}
width={this.PDFWidth}
pageNumber={this.state.pageIndex + 1}
className="pdfPage"
renderMode="svg"
/>
<FakePage
// This is where we can customize how many pages we need cached
pages={Math.min(this.state.numPages, this.state.pageIndex+20)}
width={this.PDFWidth}
/>
</Document>
</div>
)
}
}
class FakePage extends React.Component {
constructor(props) {
super(props)
}
render() {
return(
<div style={{display: 'none'}}>
{
Array.from(
new Array(this.props.pages),
(el, index) => (
<Page
key={`page_${index + 1}`}
width={this.props.width}
className="pdfPage"
renderMode="svg"
pageNumber={index + 1}
/>
),
)
}
</div>
)
}
} |
That is pretty damn sweet! Does it work with React-PDF 2.x? I'd be genuinely surprised, think 3.0.0 would be the first version to handle all of that correctly! |
I did not try it with 2. Lately I use your alpha release (which is awesome btw and I haven't got any errors yet). Still I could not find a way to make the infinite scrolling behavior and keep track of which page is visible. When I tried to apply this solution to your example I got pages with very big and variable gaps between them. I also could not trigger the caching of the next pages (since I could not get the visible page). I believe with a little time and patience a solution can be found that does not involve external libraries. |
Have you tried react-virtualized as other folks here suggested? |
I tried it but did not get very far. I believe that it would give unnecessary overhead for my app and (for my specific use case) it still did not solve the problem of having programmatic access to which page is visible. tbh I did not spend much time on it as the fakepage trick struck me right after I installed the library :P If I find the time I will try to find the solution to the problem and close this issue once and for all. |
@fetacore I initially went the custom route, but after a lot of trying different things out I settled on react-virtualized. That was quite a pain in the ass to get to play nicely with react-pdf, but I did manage to get it to work and it made things a hell of a lot easier in the end.
That is definitely possible with react-virtualized. I needed this functionality in my app as well. You can use the |
Hey, @fetacore @jesusgp22 @wojtekmaj and all friends of react-pdf! Has anyone gotten further on the subject? I really appreciate the discussion so far, and would love to hear your latest ideas, thanks! |
@Lukars I've gotten react-pdf to play nicely with react-window (a newer and more compact version of react-virtualized). I used the VariableSizeList which can window different page heights/widths. On Document documentLoadSuccess, I call a function inherited from a parent component, setPageHeights, which caches all pageHeights (scaled dynamically to parent component height). I then pass pageHeights as a prop to the child again, and there pass it into the itemSize prop of the VariableSizeList like itemSize={pageHeights[index]}. When resizing or zooming in/out, I call the setPageHeights again with the updated scale and parent container, and use the VariableSizeList method resetAfterIndex(0) after updating pageHeights to clear and update the internally cached row heights in the VariableSizeList. I also made a very small PureComponent for wrapping Page with a div wrapping it. This div then get's passed the styles from react-window. I think this part is pretty crucial for performance. On the built solution it looks pretty much 100% smooth even with scrolling very fast with 150 page + PDFs. |
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this issue will be closed in 14 days. |
I am a bit confused why the PDF.js viewer from Mozilla (https://mozilla.github.io/pdf.js/web/viewer.html) can load large PDF instantly, can zoom instantly, and you can scroll through the pages with minimal buffering. While using this library as is without performance optimization, large PDF's take at least 30 seconds to load, and I can't zoom at all because it makes the webpage freeze. I think that @ngoclinhng solution performs the best, with it I can load large PDF's faster but there is still a delay whenever I zoom. At least I am able to zoom though... One more optimization that helped me was setting However I think that there are some serious performance issues with this library since it is much slower than PDF.js by itself. I hope you can figure out why this is performing much slower than vanilla PDF.js. Edit: It would be VERY helpful if you can start viewing a PDF without downloading the entire file first. |
I was able to successfully implement a virtualized pdf view for larger pdfs. I basically just made a scroll listener that calculates the position of the current page, then it updates an array that contains the numbers of the pages that should be rendered. That array is contained in the state of my component, so when it changes, it updates what pages are rendered. It works quite well with 20 pages being rendered at a time and allowing the user to be unaware that previous pages are being removed from the DOM and additional pages are being added. If anyone's interested in seeing my implementation, let me know and I can try to make a concise version to paste here. |
Yes, i am interested in, appreciate your help to share. Thanks. |
this is a demo for large pdfs: https://github.com/zhoumy96/react-pdf-large-files |
"without performance optimization" is the key here. React-PDF is NOT a PDF viewer - it is only a tool to build one. If you want to browse 100 page PDFs, you need to take similar precautions as if you were trying to open 100 images at once, or 100 videos, or whatever. You wouldn't open them all at once, would you?
You can, as long as Range header is supported by the server you're serving the content from. |
@zhoumy96 Good example. Rendering pages only when they are actually needed is a key for performant PDF viewer. |
Here's my take on hooking React-PDF to React-Window. https://codesandbox.io/s/react-pdf-react-window-x3xzzg |
Would you mind giving an example of how to do this? I tried setting My PDF files are hosted in an AWS S3 bucket which does support range requests to my knowledge. |
Hmm, not sure about that. I'm pretty sure PDF.js will request only as much data as needed, if it's possible, e.g. when you only want to display Page 1. If it's not happening, it's on PDF.js side. There may be something else that I don't know about that might prevent partial download from happening, e.g. PDF built in a specific way or something. |
@wojtekmaj Thank you for providing the great demo. When I try to implement it in my application and load a large pdf file I notice the memory is keep increasing when I keep loading following pages or switch between pages. And the memory are not released until I close the browser tab.. Any idea how can we optimize it? Thank you! |
Ok, I figured out what the problem here was: The PDF files have to be "linearized", which means that they are saved in a way so that the file can be requested in chunks. On a Mac, I just opened the PDF in Preview, reordered a page and put it back (otherwise it doesn't save if no changes are made), and hit File -> Save. It should save linearized by default. On Windows you will probably have to find a third party app to do it (like Acrobat). I hope that helps anyone that was having the same issue. |
This is working perfectly. Thankyou so much EDIT: Any help to make the navigation button work in these codes? EDIT #2: So the solution to going to specific page is given by react-window documents React-window . Previous Page and next page can also be done similarly |
I think still load the full PDF if you pass in <Document... onLoadProgress={onDocumentLoadProgress}
Logs will display the total load |
@wojtekmaj @EricLiu0614 Any update on this. Even I had a similar problem, when i play around with scroll on huge pdfs, there seems to be a memory leak, which is causing page crash. |
The browser does not expose Accept-Ranges and Content-Range by default. These two headers will cause pdf.js to mistakenly think that the server does not support range requests, and then directly request the entire file. My CORS configuration for s3 bucket.
Expose these header to let react-pdf known about the headers that it need to stream |
The memory leak seems to be due to code sandbox and not react-pdf + react-window example. Did you try to run the example locally? |
This comment has been minimized.
This comment has been minimized.
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
This might be a good question for pdf.js community itself but how does rendering large PDFs can be better handled with react-pdf?
pdf.js suggests not rendering more than 25 pages at a time:
https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#allthepages
I even had to add this to my component to keep react from trying re-create the virtual DOM of the Document:
The problem is that I also need to dynamically set the width of the document on user interacting so I can't save myself from re-creating the virtual DOM after width changes, any way I can achieve this with your lib?
The text was updated successfully, but these errors were encountered: