Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection timeout when streaming large files? #439

Closed
SimunKaracic opened this issue Aug 25, 2023 · 1 comment
Closed

Connection timeout when streaming large files? #439

SimunKaracic opened this issue Aug 25, 2023 · 1 comment

Comments

@SimunKaracic
Copy link

SimunKaracic commented Aug 25, 2023

I am seeing slow performance and connection timeouts for "large" files (only tested with 10 MB or above).
Example:
Streaming the contents of all the files in a folder.
The folder consists of 2 files which are filled with json lines.
I was able to reproduce the issues on AWS with the following code:

    s3.listAllObjects(s3Config.extractedDataBucket, ListObjectOptions.default.copy(prefix = Some(s"${dataFolder}")))
      .flatMap(file => s3
        .streamLines(s3Config.extractedDataBucket, file.key)
        .map(line => jawn.decode[T](line))
      )
      .collectRight
      .groupedWithin(1000, 10.seconds)
      .runCount

The equivalent code runs quickly and to completion when run locally with minio.
Anyone have ideas on how to fix this? Atleast increase the timeout?

Version:
"dev.zio" %% "zio-s3" % "0.4.2.1"

Stacktrace:

2023-08-25 21:26:56 | java.net.SocketTimeoutException: 30,000 milliseconds timeout on connection http-outgoing-4 [ACTIVE] |  
-- | -- | --
  |   | 2023-08-25 21:26:56 | at org.opensearch.client.RestClient.extractAndWrapCause(RestClient.java:936) |  
  |   | 2023-08-25 21:26:56 | at org.opensearch.client.RestClient.performRequest(RestClient.java:332) |  
  |   | 2023-08-25 21:26:56 | at org.opensearch.client.RestClient.performRequest(RestClient.java:320) |  
  |   | 2023-08-25 21:26:56 | at org.opensearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1911) |  
  |   | 2023-08-25 21:26:56 | at org.opensearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1877) |  
  |   | 2023-08-25 21:26:56 | at org.opensearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1845) |  
  |   | 2023-08-25 21:26:56 | at org.opensearch.client.RestHighLevelClient.bulk(RestHighLevelClient.java:364) |  
  |   | 2023-08-25 21:26:56 | at org.******.ingestion.opensearch.OpensearchClient.putDocs(OpensearchClient.scala:233) |  
  |   | 2023-08-25 21:26:56 | at org.*******.ingestion.load.Repository.$anonfun$dataStream$2(Repository.scala:26) |  
  |   | 2023-08-25 21:26:56 | at zio.stream.ZChannel.$anonfun$mapOutZIOPar$26(ZChannel.scala:647) |  
  |   | 2023-08-25 21:26:56 | at zio.ZIO$InterruptibilityRestorer$MakeInterruptible$.apply(ZIO.scala:5864) |  
  |   | 2023-08-25 21:26:56 | at zio.stream.ZChannel.$anonfun$mapOutZIOPar$25(ZChannel.scala:647) |  
  |   | 2023-08-25 21:26:56 | at zio.ZIO.$anonfun$raceFirst$1(ZIO.scala:1368) |  
  |   | 2023-08-25 21:26:56 | at zio.ZIO.raceWith(ZIO.scala:1457) |  
  |   | 2023-08-25 21:26:56 | at zio.ZIO.raceWith$(ZIO.scala:1453) |  
  |   | 2023-08-25 21:26:56 | at zio.ZIO$OnSuccessAndFailure.raceWith(ZIO.scala:5788) |  
  |   | 2023-08-25 21:26:56 | at zio.ZIO.$anonfun$race$1(ZIO.scala:1290) |  
  |   | 2023-08-25 21:26:56 | at zio.ZIO$.$anonfun$fiberIdWith$1(ZIO.scala:3175) |  
  |   | 2023-08-25 21:26:56 | at zio.ZIO$.$anonfun$descriptorWith$1(ZIO.scala:3075) |  
  |   | 2023-08-25 21:26:56 | at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1067) |  
  |   | 2023-08-25 21:26:56 | at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:890) |  
  |   | 2023-08-25 21:26:56 | at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1024) |  
  |   | 2023-08-25 21:26:56 | at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:967) |  
  |   | 2023-08-25 21:26:56 | at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1024) |  
  |   | 2023-08-25 21:26:56 | at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:967) |  
  |   | 2023-08-25 21:26:56 | at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:890) |  
  |   | 2023-08-25 21:26:56 | at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1024) |  
  |   | 2023-08-25 21:26:56 | at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1024) |  
  |   | 2023-08-25 21:26:56 | at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:967) |  
  |   | 2023-08-25 21:26:56 | at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1024) |  
  |   | 2023-08-25 21:26:56 | at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:1024) |  
  |   | 2023-08-25 21:26:56 | at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:967) |  
  |   | 2023-08-25 21:26:56 | at zio.internal.FiberRuntime.evaluateEffect(FiberRuntime.scala:381) |  
  |   | 2023-08-25 21:26:56 | at zio.internal.FiberRuntime.evaluateMessageWhileSuspended(FiberRuntime.scala:504) |  
  |   | 2023-08-25 21:26:56 | at zio.internal.FiberRuntime.drainQueueOnCurrentThread(FiberRuntime.scala:220) |  
  |   | 2023-08-25 21:26:56 | at zio.internal.FiberRuntime.run(FiberRuntime.scala:139) |  
  |   | 2023-08-25 21:26:56 | at zio.internal.ZScheduler$anon$4.run(ZScheduler.scala:476) |  
  |   | 2023-08-25 21:26:56 | Caused by: java.net.SocketTimeoutException: 30,000 milliseconds timeout on connection http-outgoing-4 [ACTIVE] |  
  |   | 2023-08-25 21:26:56 | at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.timeout(HttpAsyncRequestExecutor.java:387) |  
  |   | 2023-08-25 21:26:56 | at org.apache.http.impl.nio.client.InternalIODispatch.onTimeout(InternalIODispatch.java:98) |  
  |   | 2023-08-25 21:26:56 | at org.apache.http.impl.nio.client.InternalIODispatch.onTimeout(InternalIODispatch.java:40) |  
  |   | 2023-08-25 21:26:56 | at org.apache.http.impl.nio.reactor.AbstractIODispatch.timeout(AbstractIODispatch.java:175) |  
  |   | 2023-08-25 21:26:56 | at org.apache.http.impl.nio.reactor.BaseIOReactor.sessionTimedOut(BaseIOReactor.java:261) |  
  |   | 2023-08-25 21:26:56 | at org.apache.http.impl.nio.reactor.AbstractIOReactor.timeoutCheck(AbstractIOReactor.java:506) |  
  |   | 2023-08-25 21:26:56 | at org.apache.http.impl.nio.reactor.BaseIOReactor.validate(BaseIOReactor.java:211) |  
  |   | 2023-08-25 21:26:56 | at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:280) |  
  |   | 2023-08-25 21:26:56 | at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104) |  
  |   | 2023-08-25 21:26:56 | at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:591) |  
  |   | 2023-08-25 21:26:56 | at java.base/java.lang.Thread.run(Thread.java:833)

@SimunKaracic
Copy link
Author

Switching to many smaller files helped, but I still had to replace the flatMap with a mapZIO and collect the inner stream to a chunk before getting rid of connection timeouts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant