Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pinto list --format="%M" is slow #157

Open
mar-kolya opened this issue Apr 25, 2014 · 10 comments
Open

pinto list --format="%M" is slow #157

mar-kolya opened this issue Apr 25, 2014 · 10 comments

Comments

@mar-kolya
Copy link
Contributor

The above command runs about 10 times slower than simple list.

It looks like that the reason for this is that it fetches a lot of additional data via additional sql commands for each package.

Would it make sense to store that data in DB when package is added?

@thaljef
Copy link
Owner

thaljef commented Apr 25, 2014

Yeah, the "main module" feature was an afterthought. It's only a heuristic,
so it looks at all the other packages to determine which is the most likely
to be the main package.

If we're fairly confident that the heuristic is right (or just good enough)
then yes, we should put it in the DB.

That will require a schema change. There are a couple other schema changes
I want to do, so I've been waiting to do them all together. Perhaps now is
finally the time.

@mar-kolya
Copy link
Contributor Author

Unfortunately the 'main module' thing looks like the only way to get a 'complete list of packages to install' from a given stack in a way that is possible to feed into cpanm - so this seems kind of important. Unless I'm missing some other way to do that.

@thaljef
Copy link
Owner

thaljef commented Apr 25, 2014

This is a bit crude, and it may not be what you really want..

pinto -r you/repo list --format '%a/%f' | sort | uniq | cpanm ...

I usually encourage folks to use some kind of external file as the canonical list of top-level dependencies, rather than just assuming that everything in the stack should be installed.

However, not everyone wants to bother with that. Some folks just want to use the stack itself as the canonical list. So that's why I came up with the roots command. But it may not be adequate.

The whole concept of managing a private CPAN like this is pretty novel. I'm still trying to figure out the right tools and processes. So I really appreciate your feedback.

@mar-kolya
Copy link
Contributor Author

pinto -r you/repo list --format '%a/%f' | sort | uniq | cpanm ... - I've tried that. It looks like when I provide an actual 'filename' this forces cpanm to always install things, even if I have them already. This is prohibitively slow when I do a 'rebuild' of my project. That's why I'm looking for way to get 'main' packages.

@mar-kolya
Copy link
Contributor Author

I guess one more thought to consider is - is it possible to minimize number of tools needed?

I would prefer to have the ability to (idealy) do something like pinto -r my-project-repo install and it does 'the right thing' for my project. Dealing with cpanfile seems like an work that can be avoided.

@thaljef
Copy link
Owner

thaljef commented Apr 25, 2014

I've tried that. It looks like when I provide an actual 'filename' this forces cpanm to always install things, even if I have them already.

That makes sense.

I would prefer to have the ability to (idealy) do something like pinto -r my-project-repo install and it does 'the right thing' for my project.

Yes, that is certainly the spirit. If the roots command worked better and the "main module" feature were faster, you might do something like this:

pinto -r my-repo roots | pinto -r my-repo install

And we could probably clean that up to just be:

pinto -r my-repo install --roots

Dealing with cpanfile seems like an work that can be avoided.

At the end of the day, you still have to record your application dependencies somewhere. You could do it implicitly by just saying "my app needs everything in this stack". Or you can do it explicitly with a text file or cpanfile or META.json or whatever.

Even though we have the roots command (such as it is), I still recommend keeping an explicit list of your direct dependencies. And then let Pinto deal with all the indirect ones.

For example, here's part of the Makefile for Stratopan. So once you check out from git, all you have to do is say make dependencies to install all the right modules into local/.

depdendencies:
    sbin/cpanm --mirror-only --mirror file://$PWD/cpan --local-lib-contained $PWD/local \
    --cpanfile etc/cpanfile --notest --quiet --installdeps .

The etc/cpanfile is something I maintain by hand. Interestingly, I actually generated the first version of the cpanfile from the Pinto repository using the roots command and some --format options.

@thaljef
Copy link
Owner

thaljef commented Apr 25, 2014

Since your said you had been using a cpanfile (and possibly carton), I'm curious why you want to switch to pinto. Of course, I have my own reasons for preferring pinto. But what didn't you like about the cpanfile and what are you hoping to get from pinto instead? Knowing that might help me find the right solution for you.

@mar-kolya
Copy link
Contributor Author

Ok.

The story is as follows: we have many perl projects and for reason beyond this discussions they you very different sets of dependencies. Some use 'modern perl', some are quite old. Our original approach was to have 'corporate cpan mirror' that was used by all projects - it contains cpan mirror and some in-house modules. We use cpanfile, cpanm and carton to install from it.

Unfortunately this situation is far from ideal. To large extent because all projects 'feed' from same source. This means that when this source changes (one project gets new dependency, old version is removed from mirror, etc) all projects suffer.

So our plan is to use pinto and to have stack for each project we have. Then each project may setup their own dependencies and not, umm, depend on other projects. And that all is working fine.

The only minor problem is that 'cpanfile' is now sort of redundant. Since we have custom cpan mirror (pinto stack) for each project we just want everything from that pinto stack installed when our project is built. Separately maintaining cpanfile seems redundant.

So we first look at 'roots' but it is unreliable - it doesn't guarantee that all stuff will be installed.

We settled on using list --format '%M-%p' and grepping for all 'main' packages - we feed this list to cpanm and everything gets installed - this is the approach we currently settled on. But '%M' makes list slow - hence this 'bug'.

As an additional benefit it would be nice to not get all packages, but only 'real roots' - this way package list would be shorted and cpanm should work faster when we do rebuild.

@thaljef
Copy link
Owner

thaljef commented Apr 27, 2014

The story is as follows...

That sounds like all the right reasons to use Pinto. @hartzell has a similar situation, so he might have some suggestions for you too.

Another possibility to consider is having a separate repository for each project. Depending on how your projects are deployed, that would allow you to keep the Pinto repositories closer to the applications that use them (for example, stashing the Pinto repo inside the VCS with the code). For home-grown modules that are shared across projects, you could establish a shared Pinto repository. Then each project-specific repository would point to the shared repo (and a public CPAN) as the upstream source. In theory, it is similar to forking a project on GitHub, but instead of forking source code, you are forking a stack of dependencies.

That sort of "repository network" has always been part of the design for Pinto. However, I'm not sure if people really use it that way. I suspect most just use a single repository that pulls from the public CPAN only. But in your case, it might be worth exploring.

Meanwhile, I will work on improving the roots command. I think that will solve most of your current problems. It would help if you could provide a concrete example (e.g. a set of pinto commands) that demonstrates the misbehavior.

@mar-kolya
Copy link
Contributor Author

Well, yes, we plan to have a large number of per project per environment pinto stacks/repos.

I'll update ticket about roots command with details on how to reproduce a problem.

@mar-kolya mar-kolya changed the title pinto list --formar="%M" is slow pinto list --format="%M" is slow May 11, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants