Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: handle messages concurrently in pickup worker #119

Merged
merged 14 commits into from
Mar 14, 2023
Merged

Conversation

olizilla
Copy link
Contributor

@olizilla olizilla commented Mar 8, 2023

  • switch to an sqs lib that polls for new messages concurrently rather than in batches. This is rad as now we'll make better use of each container!
  • treat timeouts as a regular failure. Let the message go back on the queue for another node to try. After 3 goes it'll go to the dead letter queue and be marked as failed. This is fine, and simplifies the pickup worker a lot, as it doesn't need to talk to dynamo or determine the cause of an error.
  • rewrite pickup worker so we can compose it out of single-responsibility pieces instead of having to pass through the giant config ball. It's so much simpler now! You can figure our what it does from it's parts! sqsPoller + carFetcher + s3Uploader
  const pickup = createPickup({
    sqsPoller: createSqsPoller({
      queueUrl: SQS_QUEUE_URL,
      maxInFlight: BATCH_SIZE
    }),
    carFetcher: new CarFetcher({
      ipfsApiUrl: IPFS_API_URL,
      fetchTimeoutMs: TIMEOUT_FETCH
    }),
    s3Uploader: new S3Uploader({
      bucket: VALIDATION_BUCKET
    })
  })

see: https://github.com/PruvoNet/squiss-ts/

fixes #13
fixes #116
fixes #101

License: MIT

WIP

- switch to an sqs lib that polls for new messages concurrently rather than in batches
- rewrite pickup worker so we can compose it out of single-responsibilty pieces instead of having to pass through the giant config ball.

License: MIT
Signed-off-by: Oli Evans <oli@protocol.ai>
@seed-deploy seed-deploy bot temporarily deployed to pr119 March 8, 2023 22:23 Inactive
@github-actions
Copy link

github-actions bot commented Mar 8, 2023

package-lock.json changes

Click to toggle table visibility
Name Status Previous Current
@eslint/js ADDED - 8.35.0
@humanwhocodes/module-importer ADDED - 1.0.1
builtins ADDED - 5.0.1
eslint-plugin-n ADDED - 15.6.1
esquery UPDATED 1.4.0 1.5.0
globals UPDATED 13.18.0 13.20.0
grapheme-splitter ADDED - 1.0.4
js-sdsl ADDED - 4.3.0
linked-list ADDED - 2.1.0
squiss-ts ADDED - 4.4.1
ts-type-guards ADDED - 0.7.0

@seed-deploy
Copy link

seed-deploy bot commented Mar 8, 2023

View stack outputs

License: MIT
Signed-off-by: Oli Evans <oli@protocol.ai>
@seed-deploy seed-deploy bot temporarily deployed to pr119 March 9, 2023 12:51 Inactive
License: MIT
Signed-off-by: Oli Evans <oli@protocol.ai>
@seed-deploy seed-deploy bot temporarily deployed to pr119 March 9, 2023 13:48 Inactive
test the api is available when start is called.

License: MIT
Signed-off-by: Oli Evans <oli@protocol.ai>
@seed-deploy seed-deploy bot temporarily deployed to pr119 March 9, 2023 14:55 Inactive
License: MIT
Signed-off-by: Oli Evans <oli@protocol.ai>
@seed-deploy seed-deploy bot temporarily deployed to pr119 March 9, 2023 16:39 Inactive
@olizilla olizilla marked this pull request as ready for review March 9, 2023 17:02
@olizilla olizilla changed the title feat: handle concurrent batches feat: handle messages concurrently Mar 9, 2023
CMD [ "npm", "start", "-w", "pickup", "--silent"]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put less npm noise in the logs

@olizilla olizilla changed the title feat: handle messages concurrently feat: handle messages concurrently in pickup worker Mar 9, 2023
@olizilla olizilla requested a review from alanshaw March 9, 2023 17:09
License: MIT
Signed-off-by: Oli Evans <oli@protocol.ai>
@seed-deploy seed-deploy bot temporarily deployed to pr119 March 9, 2023 21:44 Inactive
fixes #101

License: MIT
Signed-off-by: Oli Evans <oli@protocol.ai>
License: MIT
Signed-off-by: Oli Evans <oli@protocol.ai>
@seed-deploy seed-deploy bot temporarily deployed to pr119 March 13, 2023 20:17 Inactive
License: MIT
Signed-off-by: Oli Evans <oli@protocol.ai>
@seed-deploy seed-deploy bot temporarily deployed to pr119 March 13, 2023 22:02 Inactive
@olizilla olizilla added this to the w3up phase 3 milestone Mar 14, 2023
@olizilla olizilla mentioned this pull request Mar 14, 2023
10 tasks
License: MIT
Signed-off-by: Oli Evans <oli@protocol.ai>
@seed-deploy seed-deploy bot temporarily deployed to pr119 March 14, 2023 11:08 Inactive
@seed-deploy
Copy link

seed-deploy bot commented Mar 14, 2023

Stack outputs updated
  • pr119-pickup-BasicApiStack

    Name Value
    ApiEndpoint https://ojhw356dr0.execute-api.us-west-2.amazonaws.com
    CustomDomain https://pr119.pickup.dag.haus
    S3EventsTopicARN arn:aws:sns:us-west-2:505595374361:pr119-pickup-S3Events
  • pr119-pickup-PickupStack

    Name Value
    ServiceSQSQueue88C16836 pr119-pickup-Pin
    ServiceSQSQueueArnDDC422AE arn:aws:sqs:us-west-2:505595374361:pr119-pickup-Pin
    ServiceValidatorSQSQueue5667BEDA pr119-pickup-ValidationPinQueue
    ServiceValidatorSQSQueueArnCF57A57B arn:aws:sqs:us-west-2:505595374361:pr119-pickup-ValidationPinQueue

License: MIT
Signed-off-by: Oli Evans <oli@protocol.ai>
@seed-deploy seed-deploy bot temporarily deployed to pr119 March 14, 2023 11:26 Inactive
2 mins is still timeout after node restarts

License: MIT
Signed-off-by: Oli Evans <oli@protocol.ai>
@seed-deploy seed-deploy bot temporarily deployed to pr119 March 14, 2023 11:43 Inactive
License: MIT
Signed-off-by: Oli Evans <oli@protocol.ai>
@seed-deploy seed-deploy bot temporarily deployed to pr119 March 14, 2023 12:55 Inactive
Copy link
Collaborator

@vasco-santos vasco-santos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Great work @olizilla , as you mentioned things got way simpler and make better use of each container 🙌🏼 Also nice finding with squiss SQS Poller, much nicer!

Leaving some non blocking feedback below. Would also be great if follow up PRs would be more scoped, instead of such big code changes when possible.

README.md Outdated
Comment on lines 99 to 102
2/3rs of home internet users can upload faster than 20Mbit/s (fixed broadband), at which 32GiB would transfer in 3.5hrs.

see: https://www.speedtest.net/global-index
see: https://www.omnicalculator.com/other/download-time?c=GBP&v=fileSize:32!gigabyte,downloadSpeed:5!megabit
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be a note in the subsection above? ### FETCH_TIMEOUT_MS

@@ -16,12 +16,12 @@
"@aws-sdk/lib-dynamodb": "^3.112.0",
"@aws-sdk/lib-storage": "^3.97.0",
"@ipld/car": "5.1.0",
"async-retry": "1.3.3",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool that we moved to p-* 👍🏼

*/
export async function fetchCar (cid, ipfsApiUrl, downloadError, timeoutMs = 30000, downloadStatusManager) {
export async function fetchCar ({ cid, ipfsApiUrl, signal }) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do default for ipfsApiUrl like below?

Comment on lines +78 to +80
if (!msg.isHandled) {
await msg.release() // back to the queue, try again
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like we do release everywhere? Should we put this after the else instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm likely to bring back "directly update pin status in db and fail where dag is too big" so left it so that each condition decides what to do with the message.

License: MIT
Signed-off-by: Oli Evans <oli@protocol.ai>
@seed-deploy seed-deploy bot temporarily deployed to pr119 March 14, 2023 15:38 Inactive
@olizilla olizilla merged commit 92935f8 into main Mar 14, 2023
@olizilla olizilla deleted the concurrently branch March 14, 2023 16:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Worker node fails to pull new message when processing 1 large item batch Timeout 4h not working Batching
2 participants