Skip to content
This repository has been archived by the owner. It is now read-only.

Add the getCredentials and claimCredentials endpoint #54

Closed
wants to merge 4 commits into from

Conversation

@walac
Copy link

@walac walac commented Jan 25, 2018

claimCrendentials is used by the worker to get its initial credentials. The worker is validated using its Instance Identity Document, alongside its PKCS7 signature.

The scopes given are the as those provided by aws-provisioner.

@djmitche Could you please validate those scopes, please? Do we need to add or remove any scope?

@walac walac requested a review from jhford Jan 25, 2018

if (!await validateIdentityDocument(sIdentDoc, signature)) {
return E();
}
Copy link
Contributor

@djmitche djmitche Jan 25, 2018

This is so cool :)

Copy link
Contributor

@jhford jhford Jan 26, 2018

agreed!

lib/api.js Outdated
return E();
}

const provisionerId = this.keyPrefix.slice(0, this.keyPrefix.length - 1);
Copy link
Contributor

@djmitche djmitche Jan 25, 2018

I think this patch is most of the work toward removing the need for a keyPrefix? Is it time to add a provisionerId config?

Copy link
Contributor

@jhford jhford Jan 26, 2018

This work is not really involved in removal of keyPrefix, that's all work around aws resource tags. But soon!

Copy link
Contributor

@djmitche djmitche Jan 26, 2018

OK, and at that point we'd have a more straightforward way to get a provisionerId? This just seems like an odd way to find that value.

scopes: [
`assume:worker-type:${provisionerId}/${instance.workerType}`,
'assume:worker-id:*',
],
Copy link
Contributor

@djmitche djmitche Jan 25, 2018

Yes, these scopes look good.

Copy link
Contributor

@jhford jhford Jan 26, 2018

I think at this point, we should also strongly consider issuing
assume:worker-id:$region:$instanceId in place of the star scopes here. Since this change is going to require a change to how workers work, this might be a good time to go through with that testing. Both because we can, and because we finally have understanding that instance ids will not be reused.

Copy link
Contributor

@djmitche djmitche Jan 26, 2018

That would be so awesome!!

err ? reject(err) : accept(tmp);
});
});
}).catch(err => tmp.unlik());
Copy link
Contributor

@djmitche djmitche Jan 25, 2018

unlink

Also, I think you want to re-reject the promise here? Otherwise you'll just get back undefined from this promise. err => { tmp.unlink(); throw err; }

Also, wouldn't https://nodejs.org/api/fs.html#fs_fs_writefilesync_file_data_options do the trick?

@walac walac force-pushed the secrets-endpoint branch from 9ceeff8 to fd852b4 Jan 25, 2018
Copy link
Contributor

@jhford jhford left a comment

Great work. The API is coming along really nicely. Let me know if I can help out with the forge stuff!

lib/api.js Outdated
return E();
}

const provisionerId = this.keyPrefix.slice(0, this.keyPrefix.length - 1);
Copy link
Contributor

@jhford jhford Jan 26, 2018

This work is not really involved in removal of keyPrefix, that's all work around aws resource tags. But soon!

lib/api.js Outdated
const sIdentDoc = new Buffer(req.body.identityDocument, 'base64').toString();
const signature = new Buffer(req.body.signature, 'base64').toString();

const E = () => {
Copy link
Contributor

@jhford jhford Jan 26, 2018

I think this name is a little confusing. reportError would be much more clear. I would like to log these errors with reasons in our logs, so that we can see why any particular request is failing. Could we have this take an object and message for use in a log.warn(obj, msg) call?

scopes: [
`assume:worker-type:${provisionerId}/${instance.workerType}`,
'assume:worker-id:*',
],
Copy link
Contributor

@jhford jhford Jan 26, 2018

I think at this point, we should also strongly consider issuing
assume:worker-id:$region:$instanceId in place of the star scopes here. Since this change is going to require a change to how workers work, this might be a good time to go through with that testing. Both because we can, and because we finally have understanding that instance ids will not be reused.

@@ -515,6 +521,62 @@ api.declare({
res.reply(prices);
});

api.declare({
method: 'delete',
Copy link
Contributor

@jhford jhford Jan 26, 2018

In order to operate well with retries, etc, we should have a separate GET and DELETE endpoint here. The GET being used to obtain the credential and the DELETE is used to register receipt of the credentials. This is the same flow we have for provisioner secrets as currently implemented. Basically, the response handler running to completion without error is not a reliable signal for whether the worker type has received and understood the credential. We don't want a machine that fails to get its credential be unable to retry and have to shut down.

const temporary = require('temporary');
const fs = require('fs');

const AWS_PUBLIC_CERTIFICATE =
Copy link
Contributor

@jhford jhford Jan 26, 2018

I really like that you're building this into the code. I think if this cert ever needed to change, the deployment overhead would justify the work.

package.json Outdated
@@ -37,6 +37,7 @@
"taskcluster-lib-monitor": "^4.6.3",
"taskcluster-lib-testing": "^1.0.4",
"taskcluster-lib-validate": "^2.1.0",
"temporary": "^0.0.8",
Copy link
Contributor

@jhford jhford Jan 26, 2018

it's amazing that node doesn't have this built in.

});
});
}).catch(err => {
tmp.unlink();
Copy link
Contributor

@jhford jhford Jan 26, 2018

this is an async call which does not check the status of the operation. We should either call the sync version or give it a callback. The problem here is that if we stop deleting files we could bring down the service with a bunch of temporary files.


if (!await validateIdentityDocument(sIdentDoc, signature)) {
return E();
}
Copy link
Contributor

@jhford jhford Jan 26, 2018

agreed!

}

// see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-identity-documents.html
module.exports = async function(identityDoc, signature) {
Copy link
Contributor

@jhford jhford Jan 26, 2018

I looked at this function and I agree that it definitely would accomplish the task. I'm concerned that it might not be the optimal approach, however. The concern is both overhead of spawning processes, but also that we're generating a bunch of files on disk and having to deal with the associated I/O complexity, which you've already encountered in tests. It also makes us depend on having an openssl binary on the server, which could complicated future deployments.

I did some research and it looks like there's a broadly adopted crypto library called forge that understands how to parse PKSC7 and the node's standard library crypto.createVerify function could be used to do the actual signature validation. I would really like us to do this in-process using Node to do validation if we can.

Copy link
Contributor

@jhford jhford Jan 29, 2018

I just chatted with @franziskuskiefer about this and his recommendation is that we not do crypto implemented in JS. Since crypto.createVerify is implemented in the OpenSSL portion of the node library, we should be fine to use Forge for parsing the PKCS7 files and the node crypto library for the verification. We should be careful to include notes that we not use the crypto portions of Forge, rather only the parsing portion.

'-noverify',
]);

openssl.on('close', exitCode => accept(!exitCode));
Copy link
Contributor

@jhford jhford Jan 26, 2018

given the importance of this comparison to our security, I think we should consider rejecting the promise if the exit code is non-zero.

@jhford jhford changed the title Add the claimCredentials endpoint Add the getCredentials and removeCredentials endpoint Feb 9, 2018
@jhford
Copy link
Contributor

@jhford jhford commented Feb 15, 2018

@walac how are things going?

@walac
Copy link
Author

@walac walac commented Feb 15, 2018

@walac how are things going?

I am still busy with docker-worker. But before that, the only issue that remained was PKCS verification. I researched node crypto and forge and none of those implement PKCS verification, afaict.

@walac walac force-pushed the secrets-endpoint branch 2 times, most recently from e6810ac to 3351bb8 Feb 24, 2018
@walac walac changed the title Add the getCredentials and removeCredentials endpoint Add the getCredentials and claimCredentials endpoint Feb 24, 2018
@walac walac force-pushed the secrets-endpoint branch from 3351bb8 to 7129f54 Feb 24, 2018
@walac
Copy link
Author

@walac walac commented Feb 24, 2018

@jhford I believe I fixed all the issues, could you please give another look at the patch?

Copy link
Contributor

@jhford jhford left a comment

Great progress, looking really good. Thanks for looking into using Forge, it looks good so far.

let p7;

try {
p7 = pkcs7.messageFromAsn1(asn1.fromDer(forge.util.decode64(signedData)));
Copy link
Contributor

@jhford jhford Feb 26, 2018

Looking at the code for forge.util.decode64, it looks like they are about to deprecate this function:

util.decode64 = function(input) {
 // TODO: deprecate: "Deprecated. Use util.binary.base64.decode instead."

We should use the function that's not soon to be deprecated

return null;
}

const doc = p7.rawCapture.content.value[0].value[0].value;
Copy link
Contributor

@jhford jhford Feb 26, 2018

If we're expecting each of those value lists to have a single index, can we have have assertions that both value properties are arrays and that they have exactly one index? Error messages here might be a little confusing in the case of an empty array, and could be indicative of a security issue in the case of too many indices in the array.

lib/api.js Outdated
return res.reportError('AuthorizationFailed', 'Failure to verify authorization');
};

const doc = getIdentityDocument(req.body.signature);
Copy link
Contributor

@jhford jhford Feb 26, 2018

This function should throw an error if the signature doesn't validate, and have it in a try/catch block. We can have a .code property on the thrown error to reflect whether it was a validation error or not, but we should interrupt normal execution of the program if trying to retrieve a document for which the signatures are invalid

lib/api.js Outdated
return res.reportError('AuthorizationFailed', 'Failure to verify authorization');
};

const doc = getIdentityDocument(req.body.signature);
Copy link
Contributor

@jhford jhford Feb 26, 2018

All other parts of the ec2-manager use the let declaration of function-local variables. If you'd like to propose changing that convention, let's do that in a different PR.

const assert = require('assert');
const asn1 = forge.asn1;
const pkcs7 = forge.pkcs7;
const oids = forge.oids;
Copy link
Contributor

@jhford jhford Feb 26, 2018

What about:

const {asn1, pkcs7, oids, util: forgeUtils} = require('node-forge');

Or using e.g. forge.pkcs7.messageFromAsn1(...);

test/api_test.js Outdated
certificate: 'myCertificate',
});

/*await client.claimCredentials({
Copy link
Contributor

@jhford jhford Feb 26, 2018

We should probably be calling this function for real in the test. Is there any reason why it's commented out? The whole commented out block is stuff that we must test.

Copy link
Author

@walac walac Feb 27, 2018

Good catch, I commented for a test and forgot to uncomment it

lib/api.js Outdated
'Yield credentials for the worker running in the given instance.',
].join(' '),
}, async function(req, res) {
const reportError = (obj, msg) => {
Copy link
Contributor

@jhford jhford Feb 26, 2018

Since there's only one call site for this function, we can probably remove it.

lib/api.js Outdated
}, 'Invalid signature');
}

const instances = await this.state.listInstances({
Copy link
Contributor

@jhford jhford Feb 26, 2018

Let's move the state-based logic for answering the question of "is this instance in a state to generate credentials" into a State.canClaimCredentials({id, region}, client) method.

test/api_test.js Outdated

before(() => {
taskcluster.createTemporaryCredentials = sandbox.stub();
taskcluster.createTemporaryCredentials.returns({
Copy link
Contributor

@jhford jhford Feb 26, 2018

If we set a variable expectedCredentials, we could use that to set the return value of this stub as well as using it to ensure that the credentials to test against would be using the same object with assume(actualCredentials).deeply.equals(expectedCredentials)

test/api_test.js Outdated
});

after(() => {
taskcluster.createTemporaryCredentials = createTemporaryCredentials;
Copy link
Contributor

@jhford jhford Feb 26, 2018

We shouldn't need to do this, rather we should be using the sandbox's .restore() functionality

@walac walac force-pushed the secrets-endpoint branch from 7129f54 to 2b3a0ad Feb 28, 2018
@walac
Copy link
Author

@walac walac commented Feb 28, 2018

@jhford I think I fixed most (all?) of the comments. Regarding SHA256, I looked at it but it has no official documentation, but I know it is not PKCS7. Can we look at it in a follow-up PR?

One more thing: we currently depend on the node-forge master branch to make this work due to this and this.

@jhford
Copy link
Contributor

@jhford jhford commented Feb 28, 2018

@jhford I think I fixed most (all?) of the comments. Regarding SHA256, I looked at it but it has no official documentation, but I know it is not PKCS7. Can we look at it in a follow-up PR?

There's an rsa2048 file in the instance metadata that is an RSA-SHA256 signature that we could use. My understanding is that PKCS7 is a file format for storing information. It's not inherently RSA or DSA, SHA1 or SHA256. I think they just used the /pkcs7 endpoint for whatever reason.

One more thing: we currently depend on the node-forge master branch to make this work due to this and this.

Good to know. Do you know when (if) they're going to ship those in a release?

@walac walac force-pushed the secrets-endpoint branch 3 times, most recently from 8805d90 to f54c957 Feb 28, 2018
@walac
Copy link
Author

@walac walac commented Mar 1, 2018

There's a rsa2048 file in the instance metadata that is an RSA-SHA256 signature that we could use

Just to make sure we are on the same page, are you talking about http://169.254.169.254/latest/dynamic/instance-identity/signature ?

@walac walac force-pushed the secrets-endpoint branch 2 times, most recently from 72a0f65 to 75f5629 Mar 1, 2018
@walac walac force-pushed the secrets-endpoint branch from 75f5629 to 4cf5cc0 Mar 23, 2018
@walac walac force-pushed the secrets-endpoint branch from 4cf5cc0 to 4b0c055 Apr 6, 2018
@jhford
Copy link
Contributor

@jhford jhford commented Jul 3, 2018

@walac just a heads up that the library for doing verification is up for review at taskcluster/iid-verify#1 when you have a chance :)

@walac walac force-pushed the secrets-endpoint branch 2 times, most recently from 484b2ae to a6f822e Aug 29, 2018
@walac walac requested a review from jhford Aug 29, 2018
Copy link
Contributor

@jhford jhford left a comment

It's looking like some good progress has been made. There's a couple issues that I think we should resolve before merging, but I think we're definitely in the right direction.

test/api_test.js Outdated
dmljZXMgTExDAgkAlrpI2eVeGmcwCQYFKw4DAhoFAKBdMBgGCSqGSIb3DQEJAzELBgkqhkiG9w0B
BwEwHAYJKoZIhvcNAQkFMQ8XDTE4MDEyMjEyMzAwMlowIwYJKoZIhvcNAQkEMRYEFK1hHB7W5la2
AWAHCWVgYPYyJzAxMAkGByqGSM44BAMELzAtAhUAsQXD04cP48o7HVHWJtVRHZEUkBICFHcuPVAu`,
doc: util.encode64(
Copy link
Contributor

@jhford jhford Aug 29, 2018

This is using the util.encode64 function from the node-forge package, which we're no longer using

test/api_test.js Outdated
const assume = require('assume');
const main = require('../lib/main');
const {api} = require('../lib/api');
const sinon = require('sinon');
const uuid = require('uuid');
const {util} = require('node-forge');
Copy link
Contributor

@jhford jhford Aug 29, 2018

This module is required, but isn't in package.json any longer. Since iid-verify is being used now, we can probably remove the requires.

test/api_test.js Outdated
certificate: 'myCertificate',
};

let doc = JSON.parse(util.decode64(pkcs7TestData.doc));
Copy link
Contributor

@jhford jhford Aug 29, 2018

This is also using the forge package's util module

await testErrorReturn(client.getCredentials, pkcs7TestData.invalid);
});

it('claimed successfully', async() => {
Copy link
Contributor

@jhford jhford Aug 29, 2018

This test fails when I run it locally:

  1) Api credentials claimed successfully:
     Error: Failure to verify authorization
----
method:     getCredentials
errorCode:  AuthorizationFailed
statusCode: 403
time:       2018-08-29T17:26:45.582Z
      at node_modules/taskcluster-client/lib/client.js:327:21
      at node_modules/taskcluster-client/node_modules/promise/lib/core.js:33:15
      at flush (node_modules/taskcluster-client/node_modules/asap/asap.js:27:13)
      at process._tickCallback (internal/process/next_tick.js:61:11)

lib/api.js Outdated
validate(strDoc, req.body.signature);
} catch (err) {
log.error({
err: e,
Copy link
Contributor

@jhford jhford Aug 29, 2018

I'm not sure where e is being defined. We can probably use {err} type notation here, since the local variable is call err and we're setting a property called err

package.json Outdated
"lodash": "^4.17.4",
"nyc": "^11.0.3",
"pg": "^7.4.0",
"pg-pool": "^2.0.3",
"promise-retry": "^1.1.1",
Copy link
Contributor

@jhford jhford Aug 29, 2018

I'm only seeing this in test code, maybe we could move it to devDependencies if it's retained?

lib/state.js Outdated Show resolved Hide resolved
@@ -406,6 +416,44 @@ class State {
return result.rowCount === 1;
}

async claimCredentials({region, id}, client) {
Copy link
Contributor

@jhford jhford Aug 29, 2018

This function looks great!

assert.equal(result.rowCount, 1, 'updating instance state had incorrect rowCount');
}

async canClaimCredentials({region, id}, client) {
Copy link
Contributor

@jhford jhford Aug 29, 2018

Also, looking great!

package.json Outdated
@@ -6,8 +6,7 @@
"author": "John Ford <john@johnford.org>",
"license": "MPL-2.0",
"engines": {
"node": "^9.11.1",
"yarn": "^1.0.0"
"node": ">=10"
Copy link
Contributor

@jhford jhford Aug 29, 2018

Per the best practises document, we should be using a specific version of node.

@walac walac force-pushed the secrets-endpoint branch from a6f822e to a9d44db Aug 29, 2018
@walac walac force-pushed the secrets-endpoint branch 5 times, most recently from 4c738bd to f0f1023 Aug 29, 2018
@walac walac requested a review from jhford Aug 29, 2018
@walac
Copy link
Author

@walac walac commented Aug 29, 2018

@jhford I believe I fixed all the problems, but Travis still fails and it feels like it is unrelated. Could you please confirm that?

@jhford
Copy link
Contributor

@jhford jhford commented Aug 29, 2018

@walac I believe that the problem is there's something funky going on with the Postgres installation in travis. I haven't ever been able to reproduce that failure locally or in heroku, but it happens on occasion in travis. I will be adding extra logging to see if I can figure out what's going on.

@@ -518,6 +519,13 @@ describe('State', () => {

});

it('should be able to claim credentials', async() => {
await db.insertInstance(defaultInst);
assume(await db.canClaimCredentials(defaultInst)).to.be.true;
Copy link
Contributor

@jhford jhford Aug 29, 2018

This and .false need to be invoked with () or else they don't throw.

$ node
> let assume = require('assume')
undefined
> assume(false).to.be.true;
[Function: ok]
> assume(false).to.be.true();
Thrown: Error: Unknown assertation failure occured, assumed `false` to equal (===) true

@jhford
Copy link
Contributor

@jhford jhford commented Aug 29, 2018

@walac I've taken another look and a lot of the concerns aren't addressed. Maybe we should hop on vidyo and chat about it? I can meet tomorrow at 16:30 CEST (Berlin) if you'd like.

@walac
Copy link
Author

@walac walac commented Aug 30, 2018

@jhford hah, I see a lot of your comments don't show up for me in the conversation tab.

@walac walac force-pushed the secrets-endpoint branch 3 times, most recently from e6f799f to 75ab61b Aug 30, 2018
@walac walac requested a review from jhford Aug 30, 2018
Copy link
Contributor

@jhford jhford left a comment

Thanks, most of the feedback is addressed. I think the only things left to do here are figure out the unreliable test and add the facilities for picking the correct public key, then we're ready to merge I think.

// - http://www.pkiglobe.org/pkcs7.html
// - https://tools.ietf.org/html/rfc2315
function validate(doc, signedData) {
// TODO: use a per region certificate
Copy link
Contributor

@jhford jhford Aug 31, 2018

Let's put in the logic about finding the region and getting the right public key in this patch, so that the patch for the other public keys is only adding strings. Since we're already parsing the doc in the endpoint handler, maybe just pass in a region and throw an error if it's an unknown region?

lib/state.js Outdated
'INSERT INTO instances',
'(id, "workerType", region, az, "instanceType", state, "imageId", launched, "lastEvent")',
'VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9)',
'INSERT INTO instances', '(' + [
Copy link
Contributor

@jhford jhford Aug 31, 2018

There's no reason for a second level of strings being joined here.

test/api_test.js Outdated
async function testErrorReturn(fn, doc) {
// When we run "yarn test" we sometimes get ECONNREFUSED for no clear reason
// Let's add a retry to avoid this
let code = await promiseRetry(retry =>
Copy link
Contributor

@jhford jhford Aug 31, 2018

I'm still concerned about the need to do promise retries here. If it's not possible to validate reliably in the unit tests, it will be difficult to get reliable outcomes in production. How often does this fail? I looked at the test file and it seems like we do wait for the server to finish starting, so getting ECONNREFUSED is something I cannot figure out in this case.

@walac walac requested a review from jhford Sep 3, 2018
Wander Lairson Costa added 4 commits Oct 8, 2018
claimCredentials receives a RSA2048 signature of the identity document
and returns the credentials for the worker.

The document is validated according using the iid-verify [1] package.

Informations about the instance, like instance-id and region
are extracted from the identity document.

The given instance must be in a running state, otherwise the endpoint
will fail.

As iid-verify requires node 10 or newer, we upgrade the required node
version, as well as packages that fail to build with this node version.

[1] https://www.npmjs.com/package/iid-verify
@walac walac force-pushed the secrets-endpoint branch from 21fd067 to 313a423 Oct 8, 2018
@walac walac closed this Oct 8, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants