Skip to content

Loading…

There should be a way to tag package as root #158

Closed
mar-kolya opened this Issue · 25 comments

3 participants

@mar-kolya

Unfortunately roots command in its current implementation is unreliable.

Example: if I have a package A that has a package B in dev deps then package B will not show up in roots and when I install all roots I won't get package B installed.

I think that it would make sense to be able to just manually mark packages as roots and then be able to fetch that list and feed it into cpanm (or similar).

This may be just some sort of general 'tagging' mechanism for packages. So I can tag a package when I add it and then 'list' command provides an option to list packages tagged with specific tag.

This is probably more like a feature request. Would having this feature make sense?

@thaljef
Owner
@mar-kolya

Well, I personally would prefer to be able to set roots by hand. Like that:

pinto pull Package --set-as-root

Basically I'm coming from the usecase when I have a 'cpanfile' and I'm trying to get rid of it and get to Pinto. So the reasonable solution seems to just pull everything mentioned in cpanfile and set those packages as roots. This would mean that I can install everything from my pinto repo that is 'root' and this gives me exact same behaviour as having cpanfile, but I do not have to maintain separate cpan file anymore.

I mean 'heuristics' is good, but is difficult and fragile. A flag in DB seems simpler. So having DB flag in addition to heuristics seems reasonable. Or maybe having heuristics in addition to 'hard' DB flag.

@thaljef
Owner

Basically I'm coming from the usecase when I have a 'cpanfile' and I'm trying to get rid of it and get to Pinto.

For Stratopan, I use both cpanfile and Pinto. The cpanfile lists my top-level dependencies, and then I feed that to cpanm while using my Pinto repository as the --mirror.

So far, that was worked really well for me. And I know several other shops use Pinto that way too. It would be even better if pinto knew how to read a cpanfile (see #144). Or you could even just use a plain text file.

So the reasonable solution seems to just pull everything mentioned in cpanfile and set those packages as roots.

I think the trouble with that is that the roots change. As you add/remove/upgrade modules over time, the dependency graph shifts and the roots you had marked before may no longer be roots. So then you need a way to unmark them. That seems like a pain to maintain.

In a way, the cpanfile essentially is your list of roots. And it's probably easier to understand and edit than some tags inside Pinto. Perhaps best of all, you can generate it from your code using one of the prereq scanners on CPAN (or Dist::Zilla, if you use that).

@hartzell

Or if you'd like to be able to version that list and track it in the Pinto repo, wrap it up in a Task distro, e.g. Task::Deploy::MyApp or Task::Deploy::DevEnv, and add it to the repository like the rest of your stuff.

@mar-kolya

Well, currently we ended up building 'pintofile' as a list of all packages from pinto that just gets fed into cpanm. Unfortunately building that file is slow (because %M is slow).

@thaljef
Owner

Example: if I have a package A that has a package B in dev deps then package B will not show up in roots and when I install all roots I won't get package B installed.

This is easily fixable if we assume that roots should be irrespective of the type of dependency (e.g. configure/build/test/runtime/develop). I think we can justify that assumption because Pinto has no way of knowing how the dependencies relate to your application.

In other words, we can exclude the configure, build, and runtime dependencies from the roots because we can assume they will be installed as prerequisites for something. But test and develop dependencies may or may not be, depending on your cpanm options. Same goes for the recommended and suggested dependencies. Since Pinto has no idea if those dependencies are prerequisites for your application, so it must include them in the roots.

Does that make sense?

@mar-kolya

Yes, it indeed does.

So, some details on roots command issue. If you pull 'Pinto' into you repo it has Module::Faker as it's test dependency. And Module::Faker has a Text::Template as its runtime dependency. As the result Test::Template is not part of roots output. And it also doesn't get installed by cpanm (at least not by default).

This patch seems to fix this problem:

diff --git a/lib/Pinto/Schema/Result/Stack.pm b/lib/Pinto/Schema/Result/Stack.pm
index 7df3d85..924bcde 100644
--- a/lib/Pinto/Schema/Result/Stack.pm
+++ b/lib/Pinto/Schema/Result/Stack.pm
@@ -609,7 +609,8 @@ sub roots {
     for my $dist ( @dists ) {
         for my $prereq ($dist->prerequisites) {
             # TODO: Decide what to do about development prereqs
-            next if $prereq->is_core(in => $tpv) or $prereq->is_perl;
+            next if $prereq->is_core(in => $tpv) or $prereq->is_perl
+                or $prereq->phase eq 'test' or $prereq->phase eq 'develop';
             my %args = (target => $prereq->as_target, cache => \%cache);
             next unless my $prereq_dist = $self->get_distribution(%args);
             $is_prereq_dist{$prereq_dist} = 1;

Thanks!

@thaljef
Owner

This patch seems to fix this problem:

Yes, that's exactly what I had in mind :)

@mar-kolya

Looking forward to having that released then :)

@thaljef thaljef pushed a commit that referenced this issue
Jeffrey Ryan Thalhammer Include test and develop prereqs in roots.
Pinto has no way of knowing if *your* application requires
these, so it must include them in the roots.  Fixes GH #158.
d139464
@thaljef
Owner

I'm about to ship this as a development release to CPAN. It will include the colorizing changes for #155, which will require you to upgrade pintod as well (it is included in the distribution).

After a couple days on CPAN, I'll make a production release and upload it to Stratopan.

@thaljef thaljef pushed a commit that referenced this issue
Jeffrey Ryan Thalhammer v0.09992_01
      [INCOMPATILBE CHANGES]

      - The protocol used to communicate with a remote Pinto repository is now
      versioned with media types via the "Accept" header.  So if you upgrade
      to this version of pinto, you'll also need to upgrade pintod to this same
      version.  Going forward, the protocol will be versioned so that we
      can potentially support different versions of the client and server at
      the same time.  But for the time being, you'll get an exception telling
      you whether to upgrade your client (pinto) or the server (pintod).

      - The PINTO_COLORS and PINTO_COLOURS environment variables are deprecated
      in favor of PINTO_PALETTE.  You can still use the old names for now, but
      they will be removed in a future release.

      - When using the --all option with the "list" command, the default format
      shows the pin status as "?" (i.e. indeterminable) instead of showing the
      main module status.

      [BUG FIXES]

      - Colorization is now disabled by default when the output is being piped or
      redirected to a file.  This improves interoperability with tools like grep,
      sort, etc.  To force colorization, use the --color or --colour option.
      Fixes GH #155. Thanks @mar-kolya.

      - The "roots" will now include test and develop prereqs, since Pinto has no
      way of knowing how those distributions relate to your application. This
      fixes GH #158. Thanks @mar-kolya.

      [ENHANCEMENTS]

      - When using the "add" command, the paths to the local archives can now be
      expressed using file:/// URLs.

      - The PINTO_PAGER_OPTIONS environment variable can now be used to pass specific
      options to your pager when using it with pinto.  For example, to tell `less`
      to display colors.

      [DOCUMENTATION]

      - Revised the Tutorial and QuickStart documentation.
a1790d6
@thaljef
Owner

So, some details on roots command issue. If you pull 'Pinto' into you repo it has Module::Faker as it's test dependency. And Module::Faker has a Text::Template as its runtime dependency. As the result Text::Template is not part of roots output.

In that scenario, Text::Template is not the missing root, because it is a runtime prereq for Module::Faker. Module::Faker is the missing root. Agreed?

@mar-kolya

Yes, that is correct.

@thaljef thaljef pushed a commit that referenced this issue
Jeffrey Ryan Thalhammer v0.09993
      [INCOMPATILBE CHANGES]

      - The protocol used to communicate with a remote Pinto repository is now
      versioned with media types via the "Accept" header.  So if you upgrade
      to this version of pinto, you'll also need to upgrade pintod to this same
      version.  Going forward, the protocol will be versioned so that we
      can potentially support different versions of the client and server at
      the same time.  But for the time being, you'll get an exception telling
      you whether to upgrade your client (pinto) or the server (pintod).

      - The PINTO_COLORS and PINTO_COLOURS environment variables are deprecated
      in favor of PINTO_PALETTE.  You can still use the old names for now, but
      they will be removed in a future release.

      - When using the --all option with the "list" command, the default format
      shows the pin status as "?" (i.e. indeterminable) instead of showing the
      main module status.

      [BUG FIXES]

      - Colorization is now disabled by default when the output is being piped or
      redirected to a file.  This improves interoperability with tools like grep,
      sort, etc.  To force colorization, use the --color or --colour option.
      Fixes GH #155. Thanks @mar-kolya.

      - The "roots" will now include test and develop prereqs, since Pinto has no
      way of knowing how those distributions relate to your application. This
      fixes GH #158. Thanks @mar-kolya.

     - Fixed timezone.t test which would fail in some cases, ostensibly
     because the test environment would not allow us to open sockets
     (probably due to security reasons).  Thanks CPAN Testers.

      [ENHANCEMENTS]

      - When using the "add" command, the paths to the local archives can now be
      expressed using file:/// URLs.

      - The PINTO_PAGER_OPTIONS environment variable can now be used to pass specific
      options to your pager when using it with pinto.  For example, to tell `less`
      to display colors.

      [DOCUMENTATION]

      - Revised the Tutorial and QuickStart documentation.
b2ea642
@thaljef
Owner

Pinto 0.09993 has been shipped to CPAN and Stratopan. It has an improved roots command as we discussed. If this doesn't get the job done and you still need a way to manually mark the roots, then we can reopen this.

@thaljef thaljef closed this
@mar-kolya

Unfortunately there are still issues with roots command.

The easy one: dependency loops, I've submitted a patch for that.

The difficult one: conditional dependencies.

Here's an example. We have a project that depends on XML::Compile::WSDL11. When I pull it into the repo the whole bunch of XML::Compile* comes with it because of declared dependencies.

When I run roots command on that repo at first I get nothing because of the dependency loop. The is fixed now. So with that patch applied I get XML::Compile::SOAP as a root of all XML::Compile* set of modules. It happens that most of the XML::Compile* deps are being pulled by XML::Compile::SOAP depending on XML::Compile::SOAP::Daemon - this is a declared dependency. Unfortunately when XML::Compile::SOAP is build it requires XML::Compile::SOAP::Daemon only if it is already installed. The idea is that 'if module already installed then let's require it to me at least version x.yz, if module not installed - do not bother'. The net result is that XML::Compile::SOAP::Daemon doesn't get installed when XML::Compile::SOAP is installed and all XML::Compile::SOAP's deps do not get installed as well. As the result XML::Compile::WSDL11 doesn't get installed.

Unfortunately I do not know how to fix that. Yes, the module is probably wrong declaring something as a dep and not installing it. But this doesn't make my live much easier because the result of this is roots command doesn't provide me with list of things that would make my build complete. And I'm fairly certain there are other modules like that on cpan.

@thaljef
Owner

Perhaps the roots command is doomed then. As you pointed out, Pinto's understanding of the prerequisites may never be completely accurate because distributions can set dependencies dynamically. Explicitly marking the roots in Pinto would solve this, but I still feel this puts an unrealistic maintenance burden on users. Pinto should provide a way to do-the-right-thing without too much fuss (somehow).

I presume you've already tried this:

pinto list --format %p | cpanm ...

That would certainly install everything in the stack, but it makes cpanm do extra work to check for the presence of every module in every distribution. I think that should still be pretty fast, even though it may seem like overkill. If you really must have the shortest possible list of modules, then we need to find a faster way to get that list of "main" modules.

One possible caveat though: even if Pinto could give you a correct list of all the modules to install, the installation may still come out "wrong" because of the ordering of the modules. For example, a distribution might configure and build itself differently depending on whether you already have some XS backend module installed.

I think that is rare (most modules will just do-the-right-thing if you happen to install some backend module afterwards) but I have heard horror stories about getting modules installed "in the right order". So that's another reason why I think using an ordered cpanfile or text file as the canonical list of dependencies is still a good idea.

In conclusion: The roots command seems hopeless. So I'll look for ways to get a faster list of "main" modules. See #157 for reference.

@mar-kolya

Unfortunately pinto list --format %p | cpanm ... is not really an options since it makes cpanm always install everything which is prohibitively slow.

Yes, fast '%M' would be helpful.

@thaljef
Owner

Yes, feeding the full distribution name to cpanm does make it reinstall. But --format %p just lists package names, which will not cause cpanm to reinstall.

It is going to list every package in the stack instead of just the "main" one for each distribution. But that is probably ok. It just means cpanm will do a bit more work checking to see that you have each of those packages. But it won't install any more than it has to.

Do you see the difference?

@mar-kolya

Ah, my mistake. I'll see if '%p' | cpanm would be faster than %M' | cpanm.

Thanks for pointing that out!

@thaljef
Owner

I just tried it on a stack with 4531 packages from 720 distributions. After installing everything the first time into tmp/, this command took 44.5 seconds to complete:

pinto list --format %p | cpanm -L tmp

I know it seems weird to try installing every package in the stack even if many are from the same distribution. But in your case, it may actually be the "most technically correct" solution, because your goal really is to install every module, regardless of what distribution it is in.

Let me know how that goes for you. And I'll keep working on the %M anyway.

@thaljef
Owner

Oooo, I think you're going to like this: 1cc5c95

In a nutshell...

pinto install --all 

It just does what you would expect. I don't know why I didn't think of this before.

@thaljef
Owner

Some explanation: Internally, install --all is equivalent to using list --format %p and piping it to cpanm. The implementation is hackish for remote repositories, because Pinto::Server doesn't have a proper API. But I think it might be good enough for now.

@mar-kolya

I will try it out, thanks!

The only problem that I would see is the fact that currently I will not be able to avoid authentication, so I will have to come up with something for that.

@mar-kolya

I've tried '%M' vs - no '%M' on our stack. So installing everything vs installing only main packages adds about 10 seconds to installation process. I guess for now we will eat up the cost of slow '%M' since we build list of packages 'offline' anyway (because of authentication issue).

@thaljef
Owner

because of authentication issue.

Can you explain that a bit more?

I will profile both pinto and cpanm to see what is going on. But the performance of --format %p | cpanm might be as good as it gets. We will have to see.

How big is your stack? How long does it currently take to do a full build with %M or %p (assuming all modules have already been installed)? What would be an acceptable performance target for you?

@mar-kolya

Sorry I should have provided more details.

We have about 350 packages in our repo. I've tried to create to files: list of all modules ('%p') and list of all main modules ('%M-%p'). Then I've fed those files to cpanm. And cpanm processed file that contains only main modules 10 seconds faster then files that contains all modules - 4 seconds vs 14 seconds, assuming everything have been installed before. I think the cause of that is with all modules cpanm has to load more stuff to check all the versions and this slows it down. From that same perspective using 'roots' command would be the fastest if it worked.

As per authentication issue - we've discussed this in another ticket. Currently is it impossible to do 'unauthenticated' list and it seems wrong to ask for a password each time someone builds the project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.