Skip to content

[Bug]: OCR failed: java.nio.file.NoSuchFileException #3861

Open
@raphaelyancey

Description

@raphaelyancey

Installation Method

Docker

The Problem

Trying to OCR a 27 pages pdf (scanned with CamScanner Android app that added an image watermark - if that's any help) result in this error

Version of Stirling-PDF

0.46.2

Last Working Version of Stirling-PDF

No response

Page Where the Problem Occurred

No response

Docker Configuration

version: '3.3'
services:
  web:
    image: frooodle/s-pdf:latest
    ports:
      - '8081:8080'
    volumes:
      - ./data/tessdata:/usr/share/tessdata

Relevant Log Output

java.nio.file.NoSuchFileException: /tmp/ocr_process12064792784153996598/output/page_0.pdf
	at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92)
	at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106)
	at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
	at java.base/sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:224)
	at java.base/java.nio.channels.FileChannel.open(FileChannel.java:309)
	at java.base/java.nio.channels.FileChannel.open(FileChannel.java:369)
	at org.apache.pdfbox.io.RandomAccessReadBufferedFile.<init>(RandomAccessReadBufferedFile.java:110)
	at org.apache.pdfbox.io.RandomAccessReadBufferedFile.<init>(RandomAccessReadBufferedFile.java:98)
	at org.apache.pdfbox.Loader.loadPDF(Loader.java:355)
	at org.apache.pdfbox.Loader.loadPDF(Loader.java:311)
	at org.apache.pdfbox.Loader.loadPDF(Loader.java:254)
	at org.apache.pdfbox.multipdf.PDFMergerUtility.legacyMergeDocuments(PDFMergerUtility.java:463)
	at org.apache.pdfbox.multipdf.PDFMergerUtility.mergeDocuments(PDFMergerUtility.java:367)
	at org.apache.pdfbox.multipdf.PDFMergerUtility.mergeDocuments(PDFMergerUtility.java:347)
	at stirling.software.SPDF.controller.api.misc.OCRController.processPdfWithOCR(OCRController.java:156)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
	at java.base/java.lang.reflect.Method.invoke(Method.java:580)
	at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:258)
	at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:191)
	at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:118)
	at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:986)
	at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:891)
	at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:87)
	at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1089)
	at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:979)
	at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:1014)
	at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:914)
	at jakarta.servlet.http.HttpServlet.service(HttpServlet.java:547)
	at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:885)
	at jakarta.servlet.http.HttpServlet.service(HttpServlet.java:614)
	at org.eclipse.jetty.ee10.servlet.ServletHolder.handle(ServletHolder.java:736)
	at org.eclipse.jetty.ee10.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1622)
	at org.eclipse.jetty.ee10.websocket.servlet.WebSocketUpgradeFilter.doFilter(WebSocketUpgradeFilter.java:195)
	at org.eclipse.jetty.ee10.servlet.FilterHolder.doFilter(FilterHolder.java:205)
	at org.eclipse.jetty.ee10.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1594)
	at stirling.software.SPDF.config.MetricsFilter.doFilterInternal(MetricsFilter.java:46)
	at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
	at org.eclipse.jetty.ee10.servlet.FilterHolder.doFilter(FilterHolder.java:205)
	at org.eclipse.jetty.ee10.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1594)
	at stirling.software.SPDF.config.EnterpriseEndpointFilter.doFilterInternal(EnterpriseEndpointFilter.java:32)
	at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
	at org.eclipse.jetty.ee10.servlet.FilterHolder.doFilter(FilterHolder.java:205)
	at org.eclipse.jetty.ee10.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1594)
	at org.springframework.web.servlet.resource.ResourceUrlEncodingFilter.doFilter(ResourceUrlEncodingFilter.java:66)
	at org.eclipse.jetty.ee10.servlet.FilterHolder.doFilter(FilterHolder.java:205)
	at org.eclipse.jetty.ee10.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1594)
	at org.springframework.web.filter.RequestContextFilter.doFilterInternal(RequestContextFilter.java:100)
	at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
	at org.eclipse.jetty.ee10.servlet.FilterHolder.doFilter(FilterHolder.java:205)
	at org.eclipse.jetty.ee10.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1594)
	at org.springframework.web.filter.FormContentFilter.doFilterInternal(FormContentFilter.java:93)
	at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
	at org.eclipse.jetty.ee10.servlet.FilterHolder.doFilter(FilterHolder.java:205)
	at org.eclipse.jetty.ee10.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1594)
	at org.springframework.web.filter.ServerHttpObservationFilter.doFilterInternal(ServerHttpObservationFilter.java:114)
	at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
	at org.eclipse.jetty.ee10.servlet.FilterHolder.doFilter(FilterHolder.java:205)
	at org.eclipse.jetty.ee10.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1594)
	at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:201)
	at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
	at org.eclipse.jetty.ee10.servlet.FilterHolder.doFilter(FilterHolder.java:205)
	at org.eclipse.jetty.ee10.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1594)
	at org.eclipse.jetty.ee10.servlet.ServletHandler$MappedServlet.handle(ServletHandler.java:1555)
	at org.eclipse.jetty.ee10.servlet.ServletChannel.dispatch(ServletChannel.java:819)
	at org.eclipse.jetty.ee10.servlet.ServletChannel.handle(ServletChannel.java:436)
	at org.eclipse.jetty.ee10.servlet.ServletHandler.handle(ServletHandler.java:470)
	at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:575)
	at org.eclipse.jetty.ee10.servlet.SessionHandler.handle(SessionHandler.java:717)
	at org.eclipse.jetty.server.handler.ContextHandler.handle(ContextHandler.java:1064)
	at org.eclipse.jetty.server.Handler$Wrapper.handle(Handler.java:740)
	at org.eclipse.jetty.server.handler.EventsHandler.handle(EventsHandler.java:81)
	at org.eclipse.jetty.server.Server.handle(Server.java:182)
	at org.eclipse.jetty.server.internal.HttpChannelState$HandlerInvoker.run(HttpChannelState.java:662)
	at org.eclipse.jetty.server.internal.HttpConnection.onFillable(HttpConnection.java:416)
	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:322)
	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:99)
	at org.eclipse.jetty.io.SelectableChannelEndPoint$1.run(SelectableChannelEndPoint.java:53)
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.runTask(AdaptiveExecutionStrategy.java:480)
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.consumeTask(AdaptiveExecutionStrategy.java:443)
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.tryProduce(AdaptiveExecutionStrategy.java:293)
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.run(AdaptiveExecutionStrategy.java:201)
	at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:311)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:979)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.doRunJob(QueuedThreadPool.java:1209)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1164)
	at java.base/java.lang.Thread.run(Thread.java:1583)



web-1  | 11:50:29.440 [qtp890525812-33] WARN  o.a.pdfbox.pdmodel.font.PDType1Font - Using fallback font LiberationSans for Times-Roman
web-1  | 11:50:29.452 [qtp890525812-33] WARN  o.a.pdfbox.pdmodel.font.PDType1Font - Using fallback font LiberationSans for Times-Roman
web-1  | 11:50:36.730 [qtp890525812-33] WARN  o.a.pdfbox.pdmodel.font.PDType1Font - Using fallback font LiberationSans for Times-Roman
web-1  | 11:50:43.874 [qtp890525812-33] WARN  o.a.pdfbox.pdmodel.font.PDType1Font - Using fallback font LiberationSans for Times-Roman
web-1  | 11:50:51.175 [qtp890525812-33] WARN  o.a.pdfbox.pdmodel.font.PDType1Font - Using fallback font LiberationSans for Times-Roman
web-1  | 11:50:59.512 [qtp890525812-33] WARN  o.a.pdfbox.pdmodel.font.PDType1Font - Using fallback font LiberationSans for Times-Roman
web-1  | 11:51:09.059 [qtp890525812-33] WARN  o.a.pdfbox.pdmodel.font.PDType1Font - Using fallback font LiberationSans for Times-Roman
web-1  | 11:51:17.322 [qtp890525812-33] WARN  o.a.pdfbox.pdmodel.font.PDType1Font - Using fallback font LiberationSans for Times-Roman
web-1  | 11:51:26.022 [qtp890525812-33] WARN  o.a.pdfbox.pdmodel.font.PDType1Font - Using fallback font LiberationSans for Times-Roman
web-1  | 11:51:33.838 [qtp890525812-33] WARN  o.a.pdfbox.pdmodel.font.PDType1Font - Using fallback font LiberationSans for Times-Roman
web-1  | 11:51:41.370 [qtp890525812-33] WARN  o.a.pdfbox.pdmodel.font.PDType1Font - Using fallback font LiberationSans for Times-Roman
web-1  | 11:51:48.635 [qtp890525812-33] WARN  o.a.pdfbox.pdmodel.font.PDType1Font - Using fallback font LiberationSans for Times-Roman
web-1  | 11:51:56.700 [qtp890525812-33] WARN  o.a.pdfbox.pdmodel.font.PDType1Font - Using fallback font LiberationSans for Times-Roman
web-1  | 11:52:05.080 [qtp890525812-33] WARN  o.a.pdfbox.pdmodel.font.PDType1Font - Using fallback font LiberationSans for Times-Roman
web-1  | 11:52:14.823 [qtp890525812-33] WARN  o.a.pdfbox.pdmodel.font.PDType1Font - Using fallback font LiberationSans for Times-Roman
web-1  | 11:52:23.362 [qtp890525812-33] WARN  o.a.pdfbox.pdmodel.font.PDType1Font - Using fallback font LiberationSans for Times-Roman
web-1  | 11:52:32.194 [qtp890525812-33] WARN  o.a.pdfbox.pdmodel.font.PDType1Font - Using fallback font LiberationSans for Times-Roman
web-1  | 11:52:39.354 [qtp890525812-33] WARN  o.a.pdfbox.pdmodel.font.PDType1Font - Using fallback font LiberationSans for Times-Roman
web-1  | 11:52:48.927 [qtp890525812-33] WARN  o.a.pdfbox.pdmodel.font.PDType1Font - Using fallback font LiberationSans for Times-Roman
web-1  | 11:52:57.129 [qtp890525812-33] WARN  o.a.pdfbox.pdmodel.font.PDType1Font - Using fallback font LiberationSans for Times-Roman
web-1  | 11:53:07.192 [qtp890525812-33] WARN  o.a.pdfbox.pdmodel.font.PDType1Font - Using fallback font LiberationSans for Times-Roman
web-1  | 11:53:15.250 [qtp890525812-33] WARN  o.a.pdfbox.pdmodel.font.PDType1Font - Using fallback font LiberationSans for Times-Roman
web-1  | 11:53:23.042 [qtp890525812-33] WARN  o.a.pdfbox.pdmodel.font.PDType1Font - Using fallback font LiberationSans for Times-Roman
web-1  | 11:53:32.886 [qtp890525812-33] WARN  o.a.pdfbox.pdmodel.font.PDType1Font - Using fallback font LiberationSans for Times-Roman
web-1  | 11:53:42.361 [qtp890525812-33] WARN  o.a.pdfbox.pdmodel.font.PDType1Font - Using fallback font LiberationSans for Times-Roman
web-1  | 11:53:51.461 [qtp890525812-33] WARN  o.a.pdfbox.pdmodel.font.PDType1Font - Using fallback font LiberationSans for Times-Roman
web-1  | 11:54:05.151 [qtp890525812-33] WARN  o.a.pdfbox.pdmodel.font.PDType1Font - Using fallback font LiberationSans for Times-Roman
web-1  | 11:54:10.184 [qtp890525812-33] WARN  o.a.pdfbox.pdmodel.font.PDType1Font - Using fallback font LiberationSans for Times-Roman
web-1  | 11:54:13.017 [qtp890525812-33] INFO  o.a.p.e.u.DeletingRandomAccessFile - Successfully deleted temp file: /tmp/ocr_process12064792784153996598/input.pdf
web-1  | 11:54:13.037 [qtp890525812-33] WARN  o.e.j.ee10.servlet.ServletChannel - handleException /api/v1/misc/ocr-pdf java.nio.file.NoSuchFileException: /tmp/ocr_process12064792784153996598/output/page_0.pdf

Additional Information

tessdata/ is https://github.com/tesseract-ocr/tessdata/archive/refs/heads/main.zip unzipped

Browsers Affected

No response

No Duplicate of the Issue

  • I have verified that there are no existing issues raised related to my problem.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions