Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: verdaccio v4 scalability. #1459

Closed
favoyang opened this issue Sep 4, 2019 · 8 comments
Closed

Question: verdaccio v4 scalability. #1459

favoyang opened this issue Sep 4, 2019 · 8 comments

Comments

@favoyang
Copy link
Contributor

favoyang commented Sep 4, 2019

I'm planning to operate an open registry service which may have more traffics than average private registry usage at certain time. My major candidates are verdaccio, cnpm and codebox-npm. And verdaccio looks promising. I'm curious at the time of 2019, how scale is verdaccio v4.

Except an old thread #103, I didn't find much up-to-date information about scalability. By doing some researches, I narrow down to two similar solutions

  • AWS: load balancer + ec2 group (multiple verdaccio instances) + elastic file system
  • Or exactly same architecture, just replace the elastic file system with third-party S3 storage plugin.

It seems very straightforward that the elastic file system/s3 + multiple instances shall work, until I find there's a .verdaccio-db.json file, seems act as a simple shared db contains all package lists. So how it works in a cluster environment even with s3? Multiple instances write operations (adding new package) try to modify the same file, and it may out of sync, not to mention how it broadcast the changes to other instances.

I haven't dig into verdaccio code too much, is there something obvious I missed? Any real world scalability tips would be also helpful.

@favoyang
Copy link
Contributor Author

favoyang commented Sep 4, 2019

@wzrdtales do you have comment on this? Since you posted the original issue #103.

@juanpicado
Copy link
Member

hi @favoyang

The #103 basically triggered the idea of do not depend of the file system, which was the base of Sinopia.

At Verdaccio 3 we introduced the idea of Storage API, which basically allows you to use your own storage. For example, using AWS buckets https://github.com/verdaccio/docker-examples/tree/master/amazon-s3-docker-example/v4

The default authentication is also file system based by default, which can also being replaced by plugins with their own storage persistence.

until I find there's a .verdaccio-db.json file,

The .verdaccio-db.json happens only if you use the default storage.

So how it works in a cluster environment even with s3? Multiple instances write operations (adding new package) try to modify the same file, and it may out of sync, not to mention how it broadcast the changes to other instances.

For this and experiences scaling Verdaccio you need ask the community, unfortunately I cannot help in such field which time on time provide feedback how they use it, I've heard crazy things but I've never had the time to run crazy experiments aside of the docker-example repository mostly with mentoring purposes.

@juanpicado
Copy link
Member

I put my 50 cents here https://twitter.com/verdaccio_npm/status/1170374861618892800 that's all I can do from my side.

@favoyang
Copy link
Contributor Author

favoyang commented Sep 7, 2019

Thanks for sharing the info. My major concern was from https://verdaccio.org/docs/en/amazon documentation, which described a stack with one balancer + multiple verdaccio worker instances + Elastic File System. Because the .verdaccio-db.json file is obviously a shared stateful point, it won't work unless some kind concurrent tech are using, like a trivial master / slave mechanic used by most relation databases, or simply delegate it to a faster enough backend like redis, to make verdaccio worker stateless.

However the s3-docker link you shared, shows it is using the Remitly/verdaccio-s3-storage plugin. If no .verdaccio-db.json happens in this backend, we should be fine. Most time verdaccio operations fall into a simple write / massive read scenario, which relative easy to scale.

I'll close this ticket, but feel free to add more comments.

@favoyang favoyang closed this as completed Sep 7, 2019
@juanpicado
Copy link
Member

The docker example uses https://github.com/verdaccio/monorepo/tree/master/plugins/aws-s3-storage which is under control :-)

@lock
Copy link

lock bot commented Dec 6, 2019

🤖This thread has been automatically locked 🔒 since there has not been any recent activity after it was closed.
We lock tickets after 90 days with the idea to encourage you to open a ticket with new fresh data and to provide you better feedback 🤝and better visibility 👀.
If you consider, you can attach this ticket 📨 to the new one as a reference for better context.
Thanks for being a part of the Verdaccio community! 💘

@lock lock bot added the outdated label Dec 6, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Dec 6, 2019
@verdaccio verdaccio unlocked this conversation Feb 5, 2020
@apexskier
Copy link

If no .verdaccio-db.json happens in this backend, we should be fine.

Just a note on this - this file is used in the S3 storage plugin.

https://github.com/verdaccio/monorepo/blob/61e23d53b82c6948601f28337fdd3054b9336914/plugins/aws-s3-storage/src/index.ts#L169

@favoyang
Copy link
Contributor Author

favoyang commented Apr 24, 2020

Hi @apexskier

A quick update. Any store relies on the verdaccio-db.json is not scaleable. AFAIK, the only scaleable store is verdaccio-google-cloud which uses datastore to persistent the package info list.

The aws-s3 backend is not scaleable, the discussion moved to verdaccio/monorepo#317 and PR candidate is verdaccio/monorepo#275. But the review process is very slow.

The verdaccio-minio is relatively new to me. It uses a single JSON (db) to store the package info list, but the good news is it won't cache the list into memory, every action will reload that file from the disk. It has a solved issue barolab/verdaccio-minio#13 about the cluster issue and resulted in removing the memory cache. However, probably not good enough, there still could be a racing issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants