Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Security layer upgrade (API) #1194

Closed
Pavel910 opened this issue Aug 20, 2020 · 9 comments
Closed

Security layer upgrade (API) #1194

Pavel910 opened this issue Aug 20, 2020 · 9 comments
Labels
discussion improvement stale-issue This label is automatically assigned to issues that have no activity for at least 60 days.
Projects

Comments

@Pavel910
Copy link
Collaborator

Pavel910 commented Aug 20, 2020

During the past couple of months we've received many requests about our security layer: how to handle new Cognito user pools, authorize requests to API, create custom signup flows, etc. One of the requests stood out the most - support for 3rd party identity providers, like Okta, Auth0, Ory and fine-grained permission control on both access and business logic level.

This document contains our latest attempt at structuring security layer and describes the thought process and the moving parts involved in the process on the code level.

✅ We've already successfully implemented it in one of our internal projects and we're very excited about getting it into the master, but first we wanted to collect some feedback from the community! So grab some 🍿 and let's dive in!

Terminology

To set a common starting point, we'll first explain basic terms which will be used throughout the document:

General security concepts:

  • identity - verified information about a client (client is anyone/anything making a request to the API), usually pulled from a JWT token.
  • permission - a document/object containing information about what a permission holder can or can't do.
  • authentication - process of determining an identity (most commonly by verifying a JWT token). The simplest way to think of it is by asking a question Who are you?.
  • authorization - process of verifying whether an identity has permission to access requested resource or perform requested action on the API. Think of it as What are you allowed to do?.

Business logic concepts

  • user - a record in your DB that connects identity with your business logic (this will usually be information about billing, purchase history, addresses, etc., anything that is related to your business)

Security Framework (API)

If you're unfamiliar with how Webiny plugins work on the API side, please take a minute and get yourself familiar with it by watching the video about Webiny API Development, I've linked the exact timestamp that's important for our discussion.

The following diagram shows what we'll be talking about in the upcoming sections:

image

Authentication

First things first, we need to have a mechanism that will run authentication on every request. We've created a context plugin that simply looks for all security-authentication plugins and executes them one by one, until one of them returns an instance of SecurityIdentity class (you can register as many authentication plugins as you want, and use different identity providers at the same time). We require only a minimal amount of data to construct an instance of SecurityIdentity:

type SecurityAuthenticationPlugin = Plugin & {
    type: "security-authentication";
    authenticate(context: any): Promise<null> | Promise<SecurityIdentity>;
};

type SecurityIdentityData = {
    id: string; // usually a `sub` from your idToken
    login: string; // email, phone number, anything you consider a "username" or "login"
    type: string; // any string that will tell you what this identity is: "admin", "okta-user", "cognito-user", etc.
    [key: string]: any;
};

class SecurityIdentity {
    id: string;
    login: string;
    type: string;
    constructor(data: SecurityIdentityData) {
        Object.assign(this, data);
    }
}

Once these plugins are executed, you'll either end up having a verified identity, or not having an identity at all.
With this, you already have enough information to perform authorization.

Controlling access to GraphQL resolvers

Ok, now we know who's making the request. The request now lands on a GraphQL Resolver, which is defined like this:

import { hasPermission } from "@webiny/api-security";

// resolver
hasPermission("pb.page.list")((_, args, context) => { /* resolver logic */ })

The hasPermission utility is a higher order resolver, that will try to call context.security.getPermissions() (this will be forwarded to your security-authorization plugin covered in the next section), get a list of permissions and see if any of them match the pb.page.list.

Let's say you have a permission { name: "pb.page.list" } or { name: "pb.page.*" }, it will be able to match those permissions and let you execute the resolver.

Authorization

For authorization, you need to register a security-authorization plugin:

type SecurityAuthorizationPlugin = Plugin & {
    type: "security-authorization";
    getPermissions(context: Context): Promise<SecurityPermission[]>;
};

The whole purpose of this plugin is to return an array of permission objects:

type SecurityPermission = {
    name: string;
    [key: string]: any;
};

How you fetch your permissions is not that important; you can hardcode them into your plugin based on identity data, load them via another API, load them from DB, anything goes here. The only requirement is that permissions follow the above SecurityPermission type. In the examples below, we'll use hardcoded permissions just to focus on foundations of the security framework itself.

The cool thing is that we use minimatch to match permission names, so you can use wildcards to allow access to everything, by creating a permission { name: "*" }, or allowing access to everything in the CMS, by defining { name: "cms.*" }, and so on.

Controlling access for anonymous requests

Anonymous requests are those without a token (or any other way of performing authentication). With this security framework it's easy to handle these requests as well:

{
    type: "security-authorization",
    getPermissions(context) {
        if (!context.security.getIdentity()) {
            // anonymous request - only allow access to resolvers protected with `pb.page.listPublished` permission
            return [{ name: "pb.page.listPublished" }];
        }

        // verified identities - allow access to everything
        return [{ name: "*" }];
    }
}

Controlling business rules

Business rules can be anything you want, from Only load your own documents to Only see data for "Product" content model. They are defined by your app and are handled by you - the developer. The following image demonstrates how you could implement permissions control in your admin app, and how the UI maps to actual permission object:

image

To add business rules, your permission objects can be expanded with business properties:

[
  {
    name: "cms.content.list",
    own: true,
    models: ["product", "category"]
  },
  {
    name: "pb.page.list",
    own: true,
    locales: ["de", "fr"]
  }
]

Once you have this, you can now use these permissions in your resolvers:

// First let's check if the request is even allowed to access this resolver, using "hasPermission" utility
hasPermission("cms.content.list")((async (_, args, context) => {
  const { Product } = context.models;
  const identity = context.security.getIdentity();

  // Get permission by name (we already know this permission exists, because it was checked by "hasPermission")
  const permission = await context.security.getPermission("cms.content.list");

  const query = {};

  // Only load your own documents
  if (permission.own) {
    query.createdBy = identity.id;
  }

  // Only load content in specific categories
  if (permission.categories) {
    query.category = { $in: permission.categories };
  }

  return await Product.find({ query });
});

Applying the concept to service-oriented architecture

Now that we know the fundamentals, it's time to scale it up to dozens of Federated GraphQL services 🤯. See this article to better understand what I'm talking about: https://docs.webiny.com/docs/deep-dive/architecture/api

To perform authorization, we'll often require access to User model, his Roles, the Groups he belongs to, or even process some completely arbitrary business logic to determine user's permissions. Maybe even load permissions from 2-3 remote APIs, maybe even depending on weather conditions 😉 😄 (joking, of course). You get the idea - loading of permissions can be performed in any number of ways, and it all depends on your project.

To solve this problem, we're introducing a new utility lambda function, called Permissions Manager. It's a simple function, based on plugins, and you'll be able to define your logic for loading permissions based on identity and identity type determined during authentication. It is the one place where your logic for loading permissions will live and all the other services will invoke this function to ask for permissions for certain identity.

This solves the problem of duplicating the code in all the services just because you need to perform authorization. it also makes it extremely easy to modify the logic for loading permissions: you do it in one place, and deploy 1 single function.

The package @webiny/api-security-permissions-manager will contain the aforementioned Lambda handler and a client that will be imported in every service, to provide the security-authorization plugin implementation. The moment authorization is required, getPermissions will be executed, and PermissionsManagerClient will invoke the centralized Lambda function to load permissions for the given identity.

See the implementation of the api-security-permissions-manager package. It's quite simple, and doesn't contain a lot of code.

NOTE: existence of this function is optional. If you don't find it useful, you won't have it in your project.

Summary

With this we covered the API side of the story. You can mix and match, combine and customize to your heart's content. You can support multiple identity sources, multiple permissions sources.... there are not that many constraints. All we require is that your permission is defined as an object with a name property.

Security Framework (React)

This article is a work-in-progress, React side of things coming very soon.

@markwilcox
Copy link
Contributor

This looks super-flexible to me. I only have two questions:

  1. Can we support cost, performance, security and complexity optimisations for simplified cases e.g.:
    a) I want all of my users in Cognito, no access without an account, and the authentication (and maybe authorisation) to happen on the API Gateway.
    b) In future (I know you're not using the v2 HTTP APIs yet) could we use the JWT authentication that's built-in for a third party provider?

I assume yes, because the identity comes in on the context in those cases anyway, so the plugin can just get it from there, but just checking you've considered this case.

  1. For the permissions manager, why a separate lambda function rather than just a reusable code module? You'll presumably always need to wait on a permissions check before executing the rest of the request, and having a lambda function sitting waiting for another to be invoked and return a result just adds latency and cost, particularly when both functions may then be sitting there waiting for a call to a third party identity provider or a database. That could compound with federation, since multiple federated services executed in parallel for the same request may then all be trying to do the same permissions check and triggering multiple concurrent executions in parallel.

This is very similar to the database proxy service you have for MongoDB, but that's a necessary evil to help with the zombie connection issue. I'm not clear from the description above what problem making the permissions manager a separate lambda function solves, aside from the architectural cleanliness of a single responsibility per function?

@Pavel910
Copy link
Collaborator Author

Pavel910 commented Aug 21, 2020

@markwilcox
1.a - Absolutely. Everything described above kicks in once the request lands on the actual Apollo Server. If you want to lock your API on the API Gateway level, you will simply attach an authorizer and prevent requests from ever hitting the GraphQL Lambda.

1.b - not 100% sure what you mean, but once the requests hits the GraphQL API, your security-authentication plugin is responsible for handling JWT tokens, so whatever Identity Provider you're using, you'll simply create a plugin that handles those specific JWTs (Okta, Auth0,...).

2 - Permissions Manager is optional, you don't have to use it if you don't like the approach. However, it will save you from including the same code in all the services and functions, reducing your bundle size for each of your services. It all depends on how you are handling permissions in your project, and how complex/heavy the "user control" is. An additional benefit is, if you want to cache the permissions per identity, you can do it in that Permissions Manager lambda, so you don't have to query DB each time you need to load permissions. Again, this is something you as a project owner will have to judge and decide by yourself, whether it's of any value to you.

In case of a distributed team, working on different parts of the system, it's a lot easier to have this one lambda responsible for permissions, than having each team import the whole "user management" code into their own service just to perform permissions loading. The team in question may not even be the owners of the code.

In any case, the point is that this approach is VERY flexible. And you can build anything you like on top of it, and really bend it to your will.

@markwilcox
Copy link
Contributor

Thanks @Pavel910. For 1.b I'm talking about this feature of the API Gateway HTTP APIs (as opposed to the REST APIs): https://docs.aws.amazon.com/apigateway/latest/developerguide/http-api-jwt-authorizer.html
You can have API Gateway validate e.g. Auth0 users JWTs for you.

For 2, yes I wasn't really considering the large distributed team case and that's very valid. Reducing bundle size is mostly a cold start latency (and hence also possible cost) optimisation, and you'd be undoing that by adding synchronous execution of a second lambda function. I think the whole approach just triggered alarm bells because having one lambda function wait on the execution of another is a bit of a serverless anti-pattern with few good exceptions. Like you say, I don't have to use it though! I think I also missed that it really just runs the authorisation plugins in another lambda function and nothing else.

@Pavel910
Copy link
Collaborator Author

Pavel910 commented Aug 21, 2020

@markwilcox actually it just loads the identity's permissions from arbitrary sources (usually it will be from your User model, so single source, but it's flexible so large teams may load permissions from several sources). The authorization (actual checks of permissions) is still happening on each service independently. So that centralized lambda is basically like a service to fetch all permissions that belong to a given identity. Sometimes it will be useful, sometimes I guess not, but like with everything in software - there are no silver bullets; the best we can do is make it flexible and explain the PROs and CONs to the developers, so they can decide how they want to run it for themselves.

Btw. you can see the complete code for that PM handler: https://github.com/webiny/webiny-js/blob/feat/new-security/packages/api-security-permissions-manager/src/handler/index.ts

It's a very small handler, that literally just executes your logic (via the plugin you add to the handler). Once running, I think the execution times will be in range from 20-100ms, and especially when we move to DynamoDB and remove the DbProxy, it will be even better. Many options there, I guess we'll see what works best as we go and actually start using it, so the community can provide actual usage feedback.

Thanks for engaging in the discussion, it's really helpful to hear different perspectives 👍

@markwilcox
Copy link
Contributor

markwilcox commented Aug 21, 2020

and especially when we move to DynamoDB and remove the DbProxy

🙏
Fetching permissions for a single user from DynamoDB should be low single digit milliseconds. Then if I used a separate lambda function I'd still get charged for 100ms. I know some organisations will rightly prioritise their organisational efficiency over that kind of cost efficiency though, so I totally respect your stance on offering flexibility. Thanks for all the replies!

@Pavel910 Pavel910 changed the title WIP: Security layer upgrade Security layer upgrade (API) Aug 21, 2020
@Pavel910 Pavel910 added this to To do in Webiny v5 via automation Aug 28, 2020
@Pavel910
Copy link
Collaborator Author

Pavel910 commented Sep 1, 2020

The second part of this was just added: #1206
@markwilcox Would be interested in hearing your opinion, please take a look 🚀 Cheers!

@Pavel910 Pavel910 moved this from To do to In progress in Webiny v5 Sep 11, 2020
@github-actions
Copy link

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Oct 19, 2020
@adrians5j adrians5j removed the Stale label Oct 19, 2020
@webiny-bot
Copy link
Collaborator

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@webiny-bot webiny-bot added stale-issue This label is automatically assigned to issues that have no activity for at least 60 days. and removed stale-issue This label is automatically assigned to issues that have no activity for at least 60 days. labels Dec 18, 2020
@adrians5j adrians5j unpinned this issue Feb 12, 2021
@webiny-bot
Copy link
Collaborator

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@webiny-bot webiny-bot added the stale-issue This label is automatically assigned to issues that have no activity for at least 60 days. label Feb 17, 2021
Webiny v5 automation moved this from In progress to Done Feb 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion improvement stale-issue This label is automatically assigned to issues that have no activity for at least 60 days.
Projects
No open projects
Webiny v5
  
Done
Development

No branches or pull requests

4 participants