pinto list --format="%M" is slow #157

mar-kolya · 2014-04-25T19:13:02Z

The above command runs about 10 times slower than simple list.

It looks like that the reason for this is that it fetches a lot of additional data via additional sql commands for each package.

Would it make sense to store that data in DB when package is added?

thaljef · 2014-04-25T19:30:28Z

Yeah, the "main module" feature was an afterthought. It's only a heuristic,
so it looks at all the other packages to determine which is the most likely
to be the main package.

If we're fairly confident that the heuristic is right (or just good enough)
then yes, we should put it in the DB.

That will require a schema change. There are a couple other schema changes
I want to do, so I've been waiting to do them all together. Perhaps now is
finally the time.

mar-kolya · 2014-04-25T19:34:07Z

Unfortunately the 'main module' thing looks like the only way to get a 'complete list of packages to install' from a given stack in a way that is possible to feed into cpanm - so this seems kind of important. Unless I'm missing some other way to do that.

thaljef · 2014-04-25T20:05:30Z

This is a bit crude, and it may not be what you really want..

pinto -r you/repo list --format '%a/%f' | sort | uniq | cpanm ...

I usually encourage folks to use some kind of external file as the canonical list of top-level dependencies, rather than just assuming that everything in the stack should be installed.

However, not everyone wants to bother with that. Some folks just want to use the stack itself as the canonical list. So that's why I came up with the roots command. But it may not be adequate.

The whole concept of managing a private CPAN like this is pretty novel. I'm still trying to figure out the right tools and processes. So I really appreciate your feedback.

mar-kolya · 2014-04-25T20:16:15Z

pinto -r you/repo list --format '%a/%f' | sort | uniq | cpanm ... - I've tried that. It looks like when I provide an actual 'filename' this forces cpanm to always install things, even if I have them already. This is prohibitively slow when I do a 'rebuild' of my project. That's why I'm looking for way to get 'main' packages.

mar-kolya · 2014-04-25T20:18:21Z

I guess one more thought to consider is - is it possible to minimize number of tools needed?

I would prefer to have the ability to (idealy) do something like pinto -r my-project-repo install and it does 'the right thing' for my project. Dealing with cpanfile seems like an work that can be avoided.

thaljef · 2014-04-25T20:52:51Z

I've tried that. It looks like when I provide an actual 'filename' this forces cpanm to always install things, even if I have them already.

That makes sense.

I would prefer to have the ability to (idealy) do something like pinto -r my-project-repo install and it does 'the right thing' for my project.

Yes, that is certainly the spirit. If the roots command worked better and the "main module" feature were faster, you might do something like this:

pinto -r my-repo roots | pinto -r my-repo install

And we could probably clean that up to just be:

pinto -r my-repo install --roots

Dealing with cpanfile seems like an work that can be avoided.

At the end of the day, you still have to record your application dependencies somewhere. You could do it implicitly by just saying "my app needs everything in this stack". Or you can do it explicitly with a text file or cpanfile or META.json or whatever.

Even though we have the roots command (such as it is), I still recommend keeping an explicit list of your direct dependencies. And then let Pinto deal with all the indirect ones.

For example, here's part of the Makefile for Stratopan. So once you check out from git, all you have to do is say make dependencies to install all the right modules into local/.

depdendencies:
    sbin/cpanm --mirror-only --mirror file://$PWD/cpan --local-lib-contained $PWD/local \
    --cpanfile etc/cpanfile --notest --quiet --installdeps .

The etc/cpanfile is something I maintain by hand. Interestingly, I actually generated the first version of the cpanfile from the Pinto repository using the roots command and some --format options.

thaljef · 2014-04-25T23:40:58Z

Since your said you had been using a cpanfile (and possibly carton), I'm curious why you want to switch to pinto. Of course, I have my own reasons for preferring pinto. But what didn't you like about the cpanfile and what are you hoping to get from pinto instead? Knowing that might help me find the right solution for you.

mar-kolya · 2014-04-26T05:25:09Z

Ok.

The story is as follows: we have many perl projects and for reason beyond this discussions they you very different sets of dependencies. Some use 'modern perl', some are quite old. Our original approach was to have 'corporate cpan mirror' that was used by all projects - it contains cpan mirror and some in-house modules. We use cpanfile, cpanm and carton to install from it.

Unfortunately this situation is far from ideal. To large extent because all projects 'feed' from same source. This means that when this source changes (one project gets new dependency, old version is removed from mirror, etc) all projects suffer.

So our plan is to use pinto and to have stack for each project we have. Then each project may setup their own dependencies and not, umm, depend on other projects. And that all is working fine.

The only minor problem is that 'cpanfile' is now sort of redundant. Since we have custom cpan mirror (pinto stack) for each project we just want everything from that pinto stack installed when our project is built. Separately maintaining cpanfile seems redundant.

So we first look at 'roots' but it is unreliable - it doesn't guarantee that all stuff will be installed.

We settled on using list --format '%M-%p' and grepping for all 'main' packages - we feed this list to cpanm and everything gets installed - this is the approach we currently settled on. But '%M' makes list slow - hence this 'bug'.

As an additional benefit it would be nice to not get all packages, but only 'real roots' - this way package list would be shorted and cpanm should work faster when we do rebuild.

thaljef · 2014-04-27T20:44:56Z

The story is as follows...

That sounds like all the right reasons to use Pinto. @hartzell has a similar situation, so he might have some suggestions for you too.

Another possibility to consider is having a separate repository for each project. Depending on how your projects are deployed, that would allow you to keep the Pinto repositories closer to the applications that use them (for example, stashing the Pinto repo inside the VCS with the code). For home-grown modules that are shared across projects, you could establish a shared Pinto repository. Then each project-specific repository would point to the shared repo (and a public CPAN) as the upstream source. In theory, it is similar to forking a project on GitHub, but instead of forking source code, you are forking a stack of dependencies.

That sort of "repository network" has always been part of the design for Pinto. However, I'm not sure if people really use it that way. I suspect most just use a single repository that pulls from the public CPAN only. But in your case, it might be worth exploring.

Meanwhile, I will work on improving the roots command. I think that will solve most of your current problems. It would help if you could provide a concrete example (e.g. a set of pinto commands) that demonstrates the misbehavior.

mar-kolya · 2014-04-28T15:53:23Z

Well, yes, we plan to have a large number of per project per environment pinto stacks/repos.

I'll update ticket about roots command with details on how to reproduce a problem.

thaljef mentioned this issue May 10, 2014

There should be a way to tag package as root #158

Closed

mar-kolya changed the title ~~pinto list --formar="%M" is slow~~ pinto list --format="%M" is slow May 11, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pinto list --format="%M" is slow #157

pinto list --format="%M" is slow #157

mar-kolya commented Apr 25, 2014

thaljef commented Apr 25, 2014

mar-kolya commented Apr 25, 2014

thaljef commented Apr 25, 2014

mar-kolya commented Apr 25, 2014

mar-kolya commented Apr 25, 2014

thaljef commented Apr 25, 2014

thaljef commented Apr 25, 2014

mar-kolya commented Apr 26, 2014

thaljef commented Apr 27, 2014

mar-kolya commented Apr 28, 2014

pinto list --format="%M" is slow #157

pinto list --format="%M" is slow #157

Comments

mar-kolya commented Apr 25, 2014

thaljef commented Apr 25, 2014

mar-kolya commented Apr 25, 2014

thaljef commented Apr 25, 2014

mar-kolya commented Apr 25, 2014

mar-kolya commented Apr 25, 2014

thaljef commented Apr 25, 2014

thaljef commented Apr 25, 2014

mar-kolya commented Apr 26, 2014

thaljef commented Apr 27, 2014

mar-kolya commented Apr 28, 2014