Skip to content
This repository has been archived by the owner on Nov 12, 2019. It is now read-only.

Add the getCredentials and claimCredentials endpoint #54

Closed
wants to merge 4 commits into from

Conversation

walac
Copy link

@walac walac commented Jan 25, 2018

claimCrendentials is used by the worker to get its initial credentials. The worker is validated using its Instance Identity Document, alongside its PKCS7 signature.

The scopes given are the as those provided by aws-provisioner.

@djmitche Could you please validate those scopes, please? Do we need to add or remove any scope?

@walac walac requested a review from jhford January 25, 2018 13:55

if (!await validateIdentityDocument(sIdentDoc, signature)) {
return E();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so cool :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed!

lib/api.js Outdated
return E();
}

const provisionerId = this.keyPrefix.slice(0, this.keyPrefix.length - 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this patch is most of the work toward removing the need for a keyPrefix? Is it time to add a provisionerId config?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This work is not really involved in removal of keyPrefix, that's all work around aws resource tags. But soon!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, and at that point we'd have a more straightforward way to get a provisionerId? This just seems like an odd way to find that value.

scopes: [
`assume:worker-type:${provisionerId}/${instance.workerType}`,
'assume:worker-id:*',
],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, these scopes look good.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think at this point, we should also strongly consider issuing
assume:worker-id:$region:$instanceId in place of the star scopes here. Since this change is going to require a change to how workers work, this might be a good time to go through with that testing. Both because we can, and because we finally have understanding that instance ids will not be reused.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be so awesome!!

err ? reject(err) : accept(tmp);
});
});
}).catch(err => tmp.unlik());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unlink

Also, I think you want to re-reject the promise here? Otherwise you'll just get back undefined from this promise. err => { tmp.unlink(); throw err; }

Also, wouldn't https://nodejs.org/api/fs.html#fs_fs_writefilesync_file_data_options do the trick?

Copy link
Contributor

@jhford jhford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work. The API is coming along really nicely. Let me know if I can help out with the forge stuff!

lib/api.js Outdated
return E();
}

const provisionerId = this.keyPrefix.slice(0, this.keyPrefix.length - 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This work is not really involved in removal of keyPrefix, that's all work around aws resource tags. But soon!

lib/api.js Outdated
const sIdentDoc = new Buffer(req.body.identityDocument, 'base64').toString();
const signature = new Buffer(req.body.signature, 'base64').toString();

const E = () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this name is a little confusing. reportError would be much more clear. I would like to log these errors with reasons in our logs, so that we can see why any particular request is failing. Could we have this take an object and message for use in a log.warn(obj, msg) call?

scopes: [
`assume:worker-type:${provisionerId}/${instance.workerType}`,
'assume:worker-id:*',
],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think at this point, we should also strongly consider issuing
assume:worker-id:$region:$instanceId in place of the star scopes here. Since this change is going to require a change to how workers work, this might be a good time to go through with that testing. Both because we can, and because we finally have understanding that instance ids will not be reused.

@@ -515,6 +521,62 @@ api.declare({
res.reply(prices);
});

api.declare({
method: 'delete',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to operate well with retries, etc, we should have a separate GET and DELETE endpoint here. The GET being used to obtain the credential and the DELETE is used to register receipt of the credentials. This is the same flow we have for provisioner secrets as currently implemented. Basically, the response handler running to completion without error is not a reliable signal for whether the worker type has received and understood the credential. We don't want a machine that fails to get its credential be unable to retry and have to shut down.

const temporary = require('temporary');
const fs = require('fs');

const AWS_PUBLIC_CERTIFICATE =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like that you're building this into the code. I think if this cert ever needed to change, the deployment overhead would justify the work.

package.json Outdated
@@ -37,6 +37,7 @@
"taskcluster-lib-monitor": "^4.6.3",
"taskcluster-lib-testing": "^1.0.4",
"taskcluster-lib-validate": "^2.1.0",
"temporary": "^0.0.8",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's amazing that node doesn't have this built in.

});
});
}).catch(err => {
tmp.unlink();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is an async call which does not check the status of the operation. We should either call the sync version or give it a callback. The problem here is that if we stop deleting files we could bring down the service with a bunch of temporary files.


if (!await validateIdentityDocument(sIdentDoc, signature)) {
return E();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed!

}

// see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-identity-documents.html
module.exports = async function(identityDoc, signature) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at this function and I agree that it definitely would accomplish the task. I'm concerned that it might not be the optimal approach, however. The concern is both overhead of spawning processes, but also that we're generating a bunch of files on disk and having to deal with the associated I/O complexity, which you've already encountered in tests. It also makes us depend on having an openssl binary on the server, which could complicated future deployments.

I did some research and it looks like there's a broadly adopted crypto library called forge that understands how to parse PKSC7 and the node's standard library crypto.createVerify function could be used to do the actual signature validation. I would really like us to do this in-process using Node to do validation if we can.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just chatted with @franziskuskiefer about this and his recommendation is that we not do crypto implemented in JS. Since crypto.createVerify is implemented in the OpenSSL portion of the node library, we should be fine to use Forge for parsing the PKCS7 files and the node crypto library for the verification. We should be careful to include notes that we not use the crypto portions of Forge, rather only the parsing portion.

'-noverify',
]);

openssl.on('close', exitCode => accept(!exitCode));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given the importance of this comparison to our security, I think we should consider rejecting the promise if the exit code is non-zero.

@jhford jhford changed the title Add the claimCredentials endpoint Add the getCredentials and removeCredentials endpoint Feb 9, 2018
@jhford
Copy link
Contributor

jhford commented Feb 15, 2018

@walac how are things going?

@walac
Copy link
Author

walac commented Feb 15, 2018

@walac how are things going?

I am still busy with docker-worker. But before that, the only issue that remained was PKCS verification. I researched node crypto and forge and none of those implement PKCS verification, afaict.

@walac walac force-pushed the secrets-endpoint branch 2 times, most recently from e6810ac to 3351bb8 Compare February 24, 2018 16:43
@walac walac changed the title Add the getCredentials and removeCredentials endpoint Add the getCredentials and claimCredentials endpoint Feb 24, 2018
@walac
Copy link
Author

walac commented Feb 24, 2018

@jhford I believe I fixed all the issues, could you please give another look at the patch?

Copy link
Contributor

@jhford jhford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great progress, looking really good. Thanks for looking into using Forge, it looks good so far.

let p7;

try {
p7 = pkcs7.messageFromAsn1(asn1.fromDer(forge.util.decode64(signedData)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the code for forge.util.decode64, it looks like they are about to deprecate this function:

util.decode64 = function(input) {
 // TODO: deprecate: "Deprecated. Use util.binary.base64.decode instead."

We should use the function that's not soon to be deprecated

return null;
}

const doc = p7.rawCapture.content.value[0].value[0].value;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're expecting each of those value lists to have a single index, can we have have assertions that both value properties are arrays and that they have exactly one index? Error messages here might be a little confusing in the case of an empty array, and could be indicative of a security issue in the case of too many indices in the array.

lib/api.js Outdated
return res.reportError('AuthorizationFailed', 'Failure to verify authorization');
};

const doc = getIdentityDocument(req.body.signature);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function should throw an error if the signature doesn't validate, and have it in a try/catch block. We can have a .code property on the thrown error to reflect whether it was a validation error or not, but we should interrupt normal execution of the program if trying to retrieve a document for which the signatures are invalid

lib/api.js Outdated
return res.reportError('AuthorizationFailed', 'Failure to verify authorization');
};

const doc = getIdentityDocument(req.body.signature);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All other parts of the ec2-manager use the let declaration of function-local variables. If you'd like to propose changing that convention, let's do that in a different PR.

const assert = require('assert');
const asn1 = forge.asn1;
const pkcs7 = forge.pkcs7;
const oids = forge.oids;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about:

const {asn1, pkcs7, oids, util: forgeUtils} = require('node-forge');

Or using e.g. forge.pkcs7.messageFromAsn1(...);

test/api_test.js Outdated
certificate: 'myCertificate',
});

/*await client.claimCredentials({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably be calling this function for real in the test. Is there any reason why it's commented out? The whole commented out block is stuff that we must test.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I commented for a test and forgot to uncomment it

lib/api.js Outdated
'Yield credentials for the worker running in the given instance.',
].join(' '),
}, async function(req, res) {
const reportError = (obj, msg) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there's only one call site for this function, we can probably remove it.

lib/api.js Outdated
}, 'Invalid signature');
}

const instances = await this.state.listInstances({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move the state-based logic for answering the question of "is this instance in a state to generate credentials" into a State.canClaimCredentials({id, region}, client) method.

test/api_test.js Outdated

before(() => {
taskcluster.createTemporaryCredentials = sandbox.stub();
taskcluster.createTemporaryCredentials.returns({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we set a variable expectedCredentials, we could use that to set the return value of this stub as well as using it to ensure that the credentials to test against would be using the same object with assume(actualCredentials).deeply.equals(expectedCredentials)

test/api_test.js Outdated
});

after(() => {
taskcluster.createTemporaryCredentials = createTemporaryCredentials;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't need to do this, rather we should be using the sandbox's .restore() functionality

@walac
Copy link
Author

walac commented Feb 28, 2018

@jhford I think I fixed most (all?) of the comments. Regarding SHA256, I looked at it but it has no official documentation, but I know it is not PKCS7. Can we look at it in a follow-up PR?

One more thing: we currently depend on the node-forge master branch to make this work due to this and this.

@jhford
Copy link
Contributor

jhford commented Feb 28, 2018

@jhford I think I fixed most (all?) of the comments. Regarding SHA256, I looked at it but it has no official documentation, but I know it is not PKCS7. Can we look at it in a follow-up PR?

There's an rsa2048 file in the instance metadata that is an RSA-SHA256 signature that we could use. My understanding is that PKCS7 is a file format for storing information. It's not inherently RSA or DSA, SHA1 or SHA256. I think they just used the /pkcs7 endpoint for whatever reason.

One more thing: we currently depend on the node-forge master branch to make this work due to this and this.

Good to know. Do you know when (if) they're going to ship those in a release?

@walac walac force-pushed the secrets-endpoint branch 3 times, most recently from 8805d90 to f54c957 Compare February 28, 2018 20:35
@walac
Copy link
Author

walac commented Mar 1, 2018

There's a rsa2048 file in the instance metadata that is an RSA-SHA256 signature that we could use

Just to make sure we are on the same page, are you talking about http://169.254.169.254/latest/dynamic/instance-identity/signature ?

@jhford
Copy link
Contributor

jhford commented Jul 3, 2018

@walac just a heads up that the library for doing verification is up for review at taskcluster/iid-verify#1 when you have a chance :)

@walac walac force-pushed the secrets-endpoint branch 2 times, most recently from 484b2ae to a6f822e Compare August 29, 2018 15:55
@walac walac requested a review from jhford August 29, 2018 16:29
Copy link
Contributor

@jhford jhford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's looking like some good progress has been made. There's a couple issues that I think we should resolve before merging, but I think we're definitely in the right direction.

test/api_test.js Outdated
dmljZXMgTExDAgkAlrpI2eVeGmcwCQYFKw4DAhoFAKBdMBgGCSqGSIb3DQEJAzELBgkqhkiG9w0B
BwEwHAYJKoZIhvcNAQkFMQ8XDTE4MDEyMjEyMzAwMlowIwYJKoZIhvcNAQkEMRYEFK1hHB7W5la2
AWAHCWVgYPYyJzAxMAkGByqGSM44BAMELzAtAhUAsQXD04cP48o7HVHWJtVRHZEUkBICFHcuPVAu`,
doc: util.encode64(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is using the util.encode64 function from the node-forge package, which we're no longer using

test/api_test.js Outdated
const assume = require('assume');
const main = require('../lib/main');
const {api} = require('../lib/api');
const sinon = require('sinon');
const uuid = require('uuid');
const {util} = require('node-forge');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This module is required, but isn't in package.json any longer. Since iid-verify is being used now, we can probably remove the requires.

test/api_test.js Outdated
certificate: 'myCertificate',
};

let doc = JSON.parse(util.decode64(pkcs7TestData.doc));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also using the forge package's util module

await testErrorReturn(client.getCredentials, pkcs7TestData.invalid);
});

it('claimed successfully', async() => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test fails when I run it locally:

  1) Api credentials claimed successfully:
     Error: Failure to verify authorization
----
method:     getCredentials
errorCode:  AuthorizationFailed
statusCode: 403
time:       2018-08-29T17:26:45.582Z
      at node_modules/taskcluster-client/lib/client.js:327:21
      at node_modules/taskcluster-client/node_modules/promise/lib/core.js:33:15
      at flush (node_modules/taskcluster-client/node_modules/asap/asap.js:27:13)
      at process._tickCallback (internal/process/next_tick.js:61:11)

lib/api.js Outdated
validate(strDoc, req.body.signature);
} catch (err) {
log.error({
err: e,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure where e is being defined. We can probably use {err} type notation here, since the local variable is call err and we're setting a property called err

package.json Outdated
"lodash": "^4.17.4",
"nyc": "^11.0.3",
"pg": "^7.4.0",
"pg-pool": "^2.0.3",
"promise-retry": "^1.1.1",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm only seeing this in test code, maybe we could move it to devDependencies if it's retained?

lib/state.js Outdated Show resolved Hide resolved
@@ -406,6 +416,44 @@ class State {
return result.rowCount === 1;
}

async claimCredentials({region, id}, client) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function looks great!

assert.equal(result.rowCount, 1, 'updating instance state had incorrect rowCount');
}

async canClaimCredentials({region, id}, client) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, looking great!

package.json Outdated
@@ -6,8 +6,7 @@
"author": "John Ford <john@johnford.org>",
"license": "MPL-2.0",
"engines": {
"node": "^9.11.1",
"yarn": "^1.0.0"
"node": ">=10"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the best practises document, we should be using a specific version of node.

@walac walac force-pushed the secrets-endpoint branch 5 times, most recently from 4c738bd to f0f1023 Compare August 29, 2018 20:41
@walac walac requested a review from jhford August 29, 2018 20:48
@walac
Copy link
Author

walac commented Aug 29, 2018

@jhford I believe I fixed all the problems, but Travis still fails and it feels like it is unrelated. Could you please confirm that?

@jhford
Copy link
Contributor

jhford commented Aug 29, 2018

@walac I believe that the problem is there's something funky going on with the Postgres installation in travis. I haven't ever been able to reproduce that failure locally or in heroku, but it happens on occasion in travis. I will be adding extra logging to see if I can figure out what's going on.

@@ -518,6 +519,13 @@ describe('State', () => {

});

it('should be able to claim credentials', async() => {
await db.insertInstance(defaultInst);
assume(await db.canClaimCredentials(defaultInst)).to.be.true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and .false need to be invoked with () or else they don't throw.

$ node
> let assume = require('assume')
undefined
> assume(false).to.be.true;
[Function: ok]
> assume(false).to.be.true();
Thrown: Error: Unknown assertation failure occured, assumed `false` to equal (===) true

@jhford
Copy link
Contributor

jhford commented Aug 29, 2018

@walac I've taken another look and a lot of the concerns aren't addressed. Maybe we should hop on vidyo and chat about it? I can meet tomorrow at 16:30 CEST (Berlin) if you'd like.

@walac
Copy link
Author

walac commented Aug 30, 2018

@jhford hah, I see a lot of your comments don't show up for me in the conversation tab.

@walac walac force-pushed the secrets-endpoint branch 3 times, most recently from e6f799f to 75ab61b Compare August 30, 2018 15:11
@walac walac requested a review from jhford August 30, 2018 15:18
Copy link
Contributor

@jhford jhford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, most of the feedback is addressed. I think the only things left to do here are figure out the unreliable test and add the facilities for picking the correct public key, then we're ready to merge I think.

// - http://www.pkiglobe.org/pkcs7.html
// - https://tools.ietf.org/html/rfc2315
function validate(doc, signedData) {
// TODO: use a per region certificate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's put in the logic about finding the region and getting the right public key in this patch, so that the patch for the other public keys is only adding strings. Since we're already parsing the doc in the endpoint handler, maybe just pass in a region and throw an error if it's an unknown region?

lib/state.js Outdated
'INSERT INTO instances',
'(id, "workerType", region, az, "instanceType", state, "imageId", launched, "lastEvent")',
'VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9)',
'INSERT INTO instances', '(' + [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no reason for a second level of strings being joined here.

test/api_test.js Outdated
async function testErrorReturn(fn, doc) {
// When we run "yarn test" we sometimes get ECONNREFUSED for no clear reason
// Let's add a retry to avoid this
let code = await promiseRetry(retry =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still concerned about the need to do promise retries here. If it's not possible to validate reliably in the unit tests, it will be difficult to get reliable outcomes in production. How often does this fail? I looked at the test file and it seems like we do wait for the server to finish starting, so getting ECONNREFUSED is something I cannot figure out in this case.

@walac walac requested a review from jhford September 3, 2018 11:33
Wander Lairson Costa added 4 commits October 8, 2018 14:08
claimCredentials receives a RSA2048 signature of the identity document
and returns the credentials for the worker.

The document is validated according using the iid-verify [1] package.

Informations about the instance, like instance-id and region
are extracted from the identity document.

The given instance must be in a running state, otherwise the endpoint
will fail.

As iid-verify requires node 10 or newer, we upgrade the required node
version, as well as packages that fail to build with this node version.

[1] https://www.npmjs.com/package/iid-verify
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants