Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(import): convert package-lock.json to yarn.lock #5745

Merged
merged 14 commits into from
May 17, 2018

Conversation

imsnif
Copy link
Member

@imsnif imsnif commented Apr 27, 2018

This is the first step to addressing: #5654

Overview (TLDR)

At the moment, one is able to yarn import using node_modules as the dependency-tree state to create a yarn.lock file.
The feature created by this PR allows one to yarn import using package-lock.json as the dependency-tree state. When one enters the yarn import command, if a package-lock.json file exists, yarn would attempt to import from it. If not, it would import from node_modules normally.

How does this work?

A logical dependency tree is created from the package-lock.json file, then yarn walks through the dependencies as normal, using the logical tree to determine the package details. Often, this tree will not include the full information needed by yarn (eg. package shasum), and for this reason the package manifest is downloaded from the registry. This makes the process a little lengthy, but not terribly so. I deem this acceptable, seeing as this is not an everyday process. Would love to hear your thoughts if you disagree.

I opted not to fall back on (or search initially for) the manifest in node_modules, because I felt this would be error prone and introduce non-trivial complexity (eg. corrupted node_modules).

Limitations

Rarely, a yarn.lock would not be 100% logically identical to the package-lock.json from which it was created. In all incidents I encountered, this happens because of built-in limitations in yarn.
An example is two different packages with the same semver range string (eg. ^2.0.0) being resolved in package-lock.json to two different versions (eg. 2.0.1 and 2.0.2).
Another example is bundled dependencies.

I encourage the readers to test things for themselves (if there is interest, I have a script that logically compares yarn/npm trees - I can clean it up and post it somewhere).

Open Questions / Issues

  1. This PR only adds a package-lock.json import feature and does not address the full concerns raised in the issue linked above. I feel this is a good first step, after which we can continue discussing whether yarn install should issue a warning/error if it finds a package-lock.json file, and whether import should delete the package-lock.json file after successfully importing. As well as the myriad of other concerns raised there.
  2. The careful reader would notice that there is a slight discrepancy in the import types. By nature when importing from node_modules, the user would end up with a fully installed node_modules folder consistent with their yarn.lock file. This is not the case when importing from package-lock.json. One must yarn install after yarn importing from a package-lock.json. In my eyes, this is a desired behaviour for the conversion (which I see more as a 'surgical fix') even if it is not such a desired behaviour for the import command. I chose to give more weight to the former, but am of course open to different approaches.
  3. Continuing point 2, an empty node_modules folder is created when importing from package-lock.json as mentioned here: extraneous files always checks/cleans node_modules, ignoring --modules-folder specified folder #5419

Tests

In the tests for this feature I copied the relevant cases from the existing import tests. I did not (as I initially intended) add tests that make a logical comparison of the created yarn.lock against the originating package-lock.json.
The existing cases use file snapshots, and I felt this would be more coherent upon failure. Would love to hear differing opinions though.

@buildsize
Copy link

buildsize bot commented Apr 27, 2018

File name Previous Size New Size Change
yarn-[version].noarch.rpm 910.04 KB 913.48 KB 3.44 KB (0%)
yarn-[version].js 3.94 MB 3.96 MB 14.01 KB (0%)
yarn-legacy-[version].js 4.1 MB 4.12 MB 15.1 KB (0%)
yarn-v[version].tar.gz 914.98 KB 918.48 KB 3.5 KB (0%)
yarn_[version]all.deb 676.47 KB 678.65 KB 2.18 KB (0%)

@imsnif imsnif requested review from BYK and arcanis April 27, 2018 12:04
@imsnif
Copy link
Member Author

imsnif commented Apr 27, 2018

@iarna, @zkat - if you'd be willing to provide your input I'd very much appreciate it.

@imsnif
Copy link
Member Author

imsnif commented Apr 29, 2018

Removed node4 support for this feature at @BYK's kind recommendation. :)

@Daniel15
Copy link
Member

Daniel15 commented May 4, 2018

38% size increase (from 3.95 MB to 5.46 MB) seems pretty substantial... Is all that weight from pacote?

@zkat
Copy link
Contributor

zkat commented May 4, 2018

Mostly bluebird, through cacache and pacote and such, yeah:

148K    node_modules/readable-stream
184K    node_modules/cacache
196K    node_modules/pacote
208K    node_modules/tar
216K    node_modules/smart-buffer
228K    node_modules/iconv-lite/encodings/tables
256K    node_modules/bluebird/js/release
288K    node_modules/es6-promise/dist
340K    node_modules/iconv-lite/encodings
376K    node_modules/socks
380K    node_modules/es6-promise
396K    node_modules/iconv-lite
432K    node_modules/bluebird/js/browser
688K    node_modules/bluebird/js
704K    node_modules/bluebird
6.1M    node_modules
6.1M    total

And you really don't need pacote here: you should be able to get the package manifest using Yarn's built in whatever-the-thing-is. You should be able to get by with only adding npm-logical-tree (which will only add ~28K or so to the unbundled version. It's basically negligible.)

@imsnif
Copy link
Member Author

imsnif commented May 4, 2018

I basically agree with everything.
My original intent in using pacote was to make it easier to get a generic "logical manifest tree" from a package.json+lockfile combination eventually. Maybe I was getting ahead of myself. :)
In any case, I only noticed the increased build size after submitting this - so yeah, it's definitely not justifiable. I'll whip up a version using the resolvers instead of pacote in the next few days.

While I do this - are there any other unrelated concerns?

@zkat
Copy link
Contributor

zkat commented May 5, 2018

(for the record, the reason pacote is so big is because you are literally pulling in npm's entire networking, caching, and package resolution layer. You're basically pulling in half the npm installer minus the tree calculation and reading logic. pacote does a lot of heavy lifting for being a tiny api :P)

@imsnif
Copy link
Member Author

imsnif commented May 6, 2018

@zkat - using pacote in this case was just a bad call on my part. :)

I removed pacote now in favor of yarn's native package manifest fetching. Build size is back to normal now.

If anyone wants to comment, give this a go or review, that would be great.

@imsnif imsnif requested a review from rally25rs May 9, 2018 10:03
Copy link
Contributor

@rally25rs rally25rs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some initial comments. So far it looks good. I think I understand what is going on here and it seems sound. I'll try to actually test it out on a repo later.

In terms of future use, I can see the case where at some point we might want to prompt or auto convert during yarn install and it might be nice to have some of this as it's own "npm lockfile converter" module, but that is refactoring that could happen later.

createLogicalDependencyTree(packageJson: ?string, packageLock: ?string): LogicalDependencyTree {
try {
if (!packageJson || !packageLock) {
throw new Error('files missing');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this string should probably come from reporter.lang and maybe report what files are expected.

Actually, since their existence is already checked around line 337-ish, do these even need to be 'nullable' types? Although I imagine Flow complains if you change ?string to just string

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is some flow acrobatics. I don't like it either.
I thought not to bother with reporter.lang in this case because it will be caught by the below try/catch block by definition. I'll try to work out a better way.

Copy link
Member

@arcanis arcanis May 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change createLogicalDependencyTree to only accept non-nullable types, then use invariant calls before calling it at line 340 to instruct flow that they cannot be null. Something like this:

invariant(packageJson, 'packageJson should exist');
invariant(packageLock, 'packageJson should exist');

That being said, the condition at line 336 should be enough - maybe the fact that you're storing this inside a variable is throwing Flow off. Maybe you can try to split it?

let importSource = 'node_modules';
if (packageJson && packageLock && semver.satisfies(nodeVersion, '>=5.0.0')) {
    importSource = 'package-lock.js';
    // ...
}

if (importSource === 'package-lock.json') {
this.reporter.info(this.reporter.lang('importPackageLock'));
const tree = this.createLogicalDependencyTree(packageJson, packageLock);
if (this.resolver instanceof ImportPackageResolver) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a known case where resolver isn't ImportPackageResolver? If so, what would happen? Since the package-lock tree won't be set on the resolver, should this check maybe be part of packageJson && packageLock && semver.satisfies(nodeVersion, '>=5.0.0') ? 'package-lock.json' : 'node_modules'; to prevent it from taking this code path? (if resolver is not instanceof ImportPackageResolver then we build the logical tree and don't use it)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More flow acrobatics, I'm afraid. :/ I've seen this done elsewhere in the file though. I'm not sure if adding that check below will satisfy flow, but can give it a shot.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just use a typecasting expression, or an invariant 🙂

while (!path.relative(this.config.cwd, cwd).startsWith('..')) {
const loc = path.join(cwd, 'node_modules', name);
const info = await this.config.getCache(`import-resolver-${loc}`, () => this.resolveLocation(loc));
if (this.request instanceof ImportPackageRequest && this.request.fixedVersionPattern) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we refactor these cases into separate functions for readability/clarity? something like

async resolve(): Promise<Manifest> {
  if (this.request instanceof ImportPackageRequest && this.request.fixedVersionPattern) {
    return await this._resolveFromFixedVersions();
  } else {
    return await this._resolveFromNodeModules();
  }
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I like it!

@imsnif
Copy link
Member Author

imsnif commented May 10, 2018

Thanks for the input @rally25rs + @arcanis!
I'll batch these up with any other changes @rally25rs has from testing this.

Copy link
Contributor

@rally25rs rally25rs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested it out on a couple small projects I had and it seems to work well. Unfortunately all my large projects (>1k packages total) have some yarn-only things going on so I can't npm i them, but I think it looks solid.

If we can try using invariant as @arcanis suggested to clean up some of the unnecessary conditionals

We should also update the website docs to explain the logic.


edit

I did manage to get the error

error An unexpected error occurred: "should have a resolved reference".

out of it. I'd share the package.json, but it has some private github repos in it. Let me see if I can narrow it down to something I can share...

@rally25rs
Copy link
Contributor

@imsnif it looks like this package causes an error:

  "dependencies": {
    "watchify": "3.2.1"
  }
yarn import v1.6.0
info found npm package-lock.json, converting to yarn.lock
warning watchify > browserify > glob > minimatch@1.0.0: Please update to minimatch 3.0.2 or higher to avoid a RegExp DoS issue
warning watchify > browserify > glob > graceful-fs@3.0.11: please upgrade to graceful-fs 4 for compatibility with current and future versions of Node.js
warning watchify > browserify > glob > graceful-fs > natives@1.1.3: This module relies on Node.js's internals and will break at some point. Do not use it, and update to graceful-fs@4.x.
error An unexpected error occurred: "should have a resolved reference".
info If you think this is a bug, please open a bug report with the information provided in "yarn-error.log".
info Visit https://yarnpkg.com/en/docs/cli/import for documentation about this command.

Using npm v5.8.0

In package-lock.json it looks like a bunch of packages depend on safe-buffer@5.1.1 but the only entry for that package at that version is

        "safe-buffer": {
          "version": "5.1.1",
          "bundled": true
        },

It looks like fsevents bundles safe-buffer, and then all the other non-bundled packages under fsevents that require safe-buffer just use it's bundled version.

@BYK
Copy link
Member

BYK commented May 11, 2018

Yarn doesn't handle bundledDependencies properly, can this be the reason?

Theory: since yarn doesn't handle bundled dependencies, it expects these packages in the cache but they are not there so it fails.

@imsnif
Copy link
Member Author

imsnif commented May 11, 2018

Nice find!

@rally25rs, @BYK - So after some digging:
This also happens when importing the same package from node_modules.
It's a bug that is due to this line: https://github.com/yarnpkg/yarn/blob/master/src/package-request.js#L208
Briefly, this is due to some assumptions made by import about version ranges and fixed versions: In the above method, the PackageRequest gets the highest semver range matching its pattern but that range does not satisfy the fixedVersion - which is not a range in this case, so it finds no manifest.
This happens in this case because there is another safe-buffer package here that matches this range: 5.1.2.

I fixed it locally for this test-case by overriding the resolveToExistingVersion method in import. Will work on it today and should have this patched up in a few hours (will also include the flow fixes).

@imsnif
Copy link
Member Author

imsnif commented May 12, 2018

I fixed some of the unnecessary conditionals after the above comments from @rally25rs and @arcanis. I went in a bit of a different direction when creating the dependency tree. In my eyes, this is clearer. What do you think?

I also refactored the resolve method in ImportResolver as @rally25rs suggested.

Finally, I issued a fix for @rally25rs's bug discussed above. Since this bug happened both when importing package-lock.json and node_modules, I also added tests for both cases. In the fixtures, I mocked it down to a smaller dependency tree than the full watchify module. Manually I tested importing from an npm installation of watchify@3.2.1 and it works.
The bug is a little involved, so I'll try to explain it as best I can:

It essentially has to do with patterns in the resolver being overridden.

So say we have this pattern in resolver.patterns:

{
  'depA@^1.0.0': Manifest {
     name: 'depA',
     version: '1.0.4'
   }
}

When we get to PackageRequest.find, we look through the resolved patterns to see whether the manifest that we have matches a version we already have in resolver.patterns:

const resolved: ?Manifest =

Now we get the request depA@^1.0.4 pointing at the same manifest in our patterns (version 1.0.4). We add it to a delayed queue and sort it out later in order not to create duplication:

yarn/src/package-request.js

Lines 244 to 247 in 05080da

if (resolved) {
this.resolver.reportPackageWithExistingVersion(this, info);
return;
}

The problem is, that we make the assumption that said Manifest will still be in resolver.patterns when we sort it out later here:
const resolved: ?Manifest = this.resolver.getHighestRangeVersionMatch(name, solvedRange, info);

In the normal course of an installation, this assumption works. When importing from a state of fixed versions, we can sometimes have an identical pattern resolving to a different manifest (one that does not satisfy the queued range). In this case, eg. depA@^1.0.0 ===> version: 1.0.3. And thus, we would throw here:
invariant(resolved, 'should have a resolved reference');

I fixed this by overriding the resolveToExisting PackageRequest method for ImportPackageRequest. It essentially does the same as the overridden original method, but if it does not find a resolved pattern, it resolves itself to its own manifest.

Would love to hear additional comments on these fixes or other issues.

@rally25rs
Copy link
Contributor

rally25rs commented May 16, 2018

With the latest changes, I get some big differences (like half the packages are missing) if I yarn install and compare that yarn.lock against one that is the result of npm i && yarn import.

Edit:

Scratch that. Don't know what I did, but now the difference is minimal. I think this should be OK to ship 👍

@imsnif imsnif merged commit 661065b into yarnpkg:master May 17, 2018
@zkat zkat mentioned this pull request May 22, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants