Skip to content

Commit

Permalink
Merge branch 'master' into fix/authz-signed-urls
Browse files Browse the repository at this point in the history
  • Loading branch information
Avantol13 committed Mar 10, 2020
2 parents a8b791a + d3ce3bc commit 76c1918
Show file tree
Hide file tree
Showing 19 changed files with 649 additions and 289 deletions.
4 changes: 2 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
# To run: docker run -d -v /path/to/fence-config.yaml:/var/www/fence/fence-config.yaml --name=fence -p 80:80 fence
# To check running container: docker exec -it fence /bin/bash

FROM quay.io/cdis/python-nginx:pybase3-1.1.0
FROM quay.io/cdis/python-nginx:pybase3-1.2.0

ENV appname=fence

RUN apk update \
&& apk add postgresql-libs postgresql-dev libffi-dev libressl-dev \
&& apk add linux-headers musl-dev gcc \
&& apk add curl bash git vim make
&& apk add curl bash git vim make lftp

COPY . /$appname
COPY ./deployment/uwsgi/uwsgi.ini /etc/uwsgi/uwsgi.ini
Expand Down
2 changes: 2 additions & 0 deletions bin/fence-create
Original file line number Diff line number Diff line change
Expand Up @@ -352,6 +352,8 @@ def main():
os.path.dirname(os.path.realpath(__file__))
)
dbGaP = os.environ.get("dbGaP") or config.get("dbGaP")
if not isinstance(dbGaP, list):
dbGaP = [dbGaP]
STORAGE_CREDENTIALS = os.environ.get("STORAGE_CREDENTIALS") or config.get(
"STORAGE_CREDENTIALS"
)
Expand Down
6 changes: 3 additions & 3 deletions docs/fence_shibboleth.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Shibboleth Single Sign-On and Federating Software is a standards based, open sou

Shibboleth is part of the InCommon Trusted Access Platform, an IAM software suite that is packaged for easy installation and configuration. InCommon operates the identity management federation for U.S. research and education, and their sponsored partners. InCommon uses SAML-based authentication and authorization systems (such as Shibboleth) to enable scalable, trusted collaborations among its community of participants.

To enable InCommon login, Shibboleth must be set up in a multi-tenant Fence instance, which lets us log in through InCommon by specifying the `shib_idp` parameter (as of Fence release 4.7.0 and Fence-shib release 2.7.2). If no `shib_idp` is specified (or if using an earlier Fence version), users will be redirected to the NIH login page by default.
To enable InCommon login, Shibboleth must be set up in a multi-tenant Fence instance, which lets us log in through InCommon by specifying the `shib_idp` parameter (as of Fence release 4.7.0 and Fenceshib release 2.7.2). If no `shib_idp` is specified (or if using an earlier Fence version), users will be redirected to the NIH login page by default.

Note that in Fence, we use the terms "Shibboleth" and "InCommon" interchangeably.

Expand All @@ -26,7 +26,7 @@ After the user logs in and is redirected to `/login/shib/login`, we get the `epp
Notes about the NIH login implementation:
- NIH login is used as the default when the `idp` is fence and no `shib_idp` is specified (for backwards compatibility).
- NIH login requires special handling because it uses slightly different login endpoints than other InCommon providers.
- When a user logs into NIH with an eRA commons ID, only the `persistent-id` is returned. For other NIH logins, both `eppn` and `persistent-id` are returned. This is why when a user logs in through NIH, we use the `persistent-id` as the username even when the `eppn` is provided (for backwards compatibility).
- When a user logs into NIH with an eRA commons ID, only the `persistent-id` is returned. For other NIH logins, both `eppn` and `persistent-id` are returned. When a user logs in through NIH, we use the `persistent-id` as the username even when the `eppn` is provided for backwards compatibility.

## Configuration

Expand Down Expand Up @@ -54,7 +54,7 @@ The Shibboleth configuration can be checked inside the Fenceshib pod under `/etc

### In the Commons which is set up with InCommon login

Register an OIDC client using [this `fence-create` command](https://github.com/uc-cdis/fence#register-internal-oauth-client), the redirect url should be `<COMMONS_URL>/user/login/fence/login`.
You will need to register this Fence as an OIDC client to the multi-tenant Fence using [this `fence-create` command](https://github.com/uc-cdis/fence#register-internal-oauth-client), the redirect url should be `<COMMONS_URL>/user/login/fence/login`.

The Fence configuration enables the `fence` provider (multi-tenant Fence setup) with the `shibboleth` provider (provider to be used by the multi-tenant Fence instance):
```
Expand Down
13 changes: 10 additions & 3 deletions docs/google_architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ In order to fully understand the options for requester pays support, it's import
The easiest option for supporting requester pays is to simply bill a Google Project you already own for all access to the bucket instead of requiring end-users to supply a project to bill. This essentially makes the requester pays bucket a non-requester pays bucket, since you'll be paying for all the access. This may be a necessary solution in cases where:

1) you want to serve data from a bucket you don't fully control (in other words, can't just turn "requester pays" off)
2) you don't want end-users to have to do manual configuration in Google Cloud Platform to enable billing their project
2) you don't want end-users (or client applications) to have to do configuration in Google Cloud Platform to enable billing their own project
3) you/end-users don't want to have to give your application IAM permissions in a project the end-user owns to automatically enable billing

**NOTE:** If you do _not_ want to bill yourself for access, it is possible to require end-users to provide the project to bill OR configure a default billing project other than one you own. _However_, this will require more work for end-users that you need to consider.
Expand All @@ -110,18 +110,25 @@ Whether you bill your own project, or require end-users to specify a billing pro

> "All actions that include a billing project in the request require serviceusage.services.use permission for the project that's specified" [according to Google's docs](https://cloud.google.com/storage/docs/access-control/iam-console).
You have 2 options to achieve the above:
You have 3 options to achieve the above:

1) assume end-users will provide the necessary permission for billing
2) configure Fence to automatically attempt to provide the necessary permission for billing
3) create a client application that automatically provides the necessary permission for billing

If you want Fence to automatically attempt to provide the necessary permissions to the relevant service accounts for data access, the Fence admin service account needs a couple pre-defined Google roles (through their Cloud IAM) on whatever project is provided for billing (be that in a request to Fence or whatever is configured as the "default billing project"):
If you want Fence to automatically attempt to provide the necessary permissions to the relevant service accounts for data access (option 2 above), the Fence admin service account needs a couple pre-defined Google roles (through their Cloud IAM) on whatever project is provided for billing (be that in a request to Fence or whatever is configured as the "default billing project"):

* `Project IAM Admin`: to update the project's policy to give the necessary service account(s) billing permission
* `Role Administrator`: for creating a custom role that only provides billing permission to the project

> NOTE: The custom role that Fence creates contains the single permission in Google `serviceusage.services.use`.
If you want to create a client application that will be able to give the right service accounts
billing permission (option 3) then there is some additional information you should know about. The
userinfo endpoint supplies the user's `primary_google_service_account` which is the email address of the Google Service Account attached to that user for the creation of Signed URLs.

> NOTE: Users' Primary Google Service Accounts are created lazily, so it is possible that `primary_google_service_account` is `None`/`null` if the user has not previously requested data via a Google Data Access method. There must be an API call previous to reading userinfo to access data in Google that the user has access to.
#### Requester Pays Signed URLs and Temporary Service Account Credentials

1) For [Signed URLs](#signed-urls): a `userProject=<google-project-to-bill>` query parameter will be appended to the signed url
Expand Down
Binary file modified docs/images/seq_diagrams/shibboleth_flow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
42 changes: 35 additions & 7 deletions docs/usersync.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,40 @@
# Usersync

Usersync is a script that parses user access information from multiple sources (user.yaml files, dbGaP user authorization telemetry files AKA whitelists) and keeps users' access to Gen3 resources up to date by updating the Fence and Arborist databases.
Usersync is a script that parses user access information from multiple sources (user.yaml files, dbGaP user authorization "telemetry" files AKA whitelists) and keeps users' access to Gen3 resources up to date by updating the Fence and Arborist databases.



## Usersync flow

![Usersync Flow](images/usersync.png)

> Note that at the time of writing, the user.yaml file overrides the access obtained from the telemetry files. In the future, usersync will combine the access instead.
> The access from the user.yaml file and the dbGaP authorization files is combined (see example below), but the user.yaml file overrides the user information (such as email) obtained from the dbGaP authorization files.
## Configuration

Configuration for user sync lives in fence-config.yaml for each respective environment. An example of the fence-config can be found [fence/config-default.yaml](https://github.com/uc-cdis/fence/blob/master/fence/config-default.yaml).

You can configure one or more dbGaP SFTP servers to sync telemetry files from. To configure one single dbGaP server, add credentials and information to the fence-config.yaml under `dbGaP`, this is outlined in [here](https://github.com/uc-cdis/fence/blob/4.14.0/fence/config-default.yaml#L389-L433)

To configure additional dbGaP servers, include in the config.yaml a list of dbGaP servers under `dbGaP`, like so:

```
dbGaP:
- info:
host:
username:
password:
...
protocol: 'sftp'
...
...
- info:
host:
username:
...
```

An example can be found in the config used for unit testing [tests/test-fence-config.yaml](https://github.com/uc-cdis/fence/blob/master/tests/test-fence-config.yaml)

## Usersync result example

Expand All @@ -18,7 +46,7 @@ Usersync is a script that parses user access information from multiple sources (
```
# authz information follows the attribute-based access control (ABAC) model
authz:
resources:
resources:
- name: programs
subresources:
- name: myprogram
Expand Down Expand Up @@ -125,7 +153,7 @@ users:
```
</details>

### Example of telemetry file (CSV format):
### Example of dbGaP authorization file (CSV format):

```
user name, login, authority, role, email, phone, status, phsid, permission set, created
Expand All @@ -135,7 +163,7 @@ Mrs. GHI,GHI,eRA,PI,ghi@com,"123-456-789",active,phs3.v2.p3.c4,"General Research

Usersync gives users "read" and "read-storage" permissions to the dbGaP studies.

> Note: The dbGaP telemetry files contain consent codes that can be parsed by usersync: [more details here](dbgap_info.md). This simplified example does not include consent code parsing.
> Note: The dbGaP authorization files contain consent codes that can be parsed by usersync: [more details here](dbgap_info.md). This simplified example does not include consent code parsing.
### Resulting access:

Expand All @@ -148,9 +176,9 @@ Usersync gives users "read" and "read-storage" permissions to the dbGaP studies.
- /open: read + read-storage
- /programs/phs1: read
- /programs/phs2: read
- /programs/phs3: read + read-storage _(from the telemetry file)_
- /programs/phs3: read + read-storage _(from the dbGaP authorization file)_
- user GHI:
- /programs/phs3: create _(user.yaml access overrides telemetry file access)_
- /programs/phs3: read + read-storage + create _(user.yaml access combined with dbGaP authorization file access)_

## Validation

Expand Down
43 changes: 43 additions & 0 deletions fence/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
from userdatamodel.driver import SQLAlchemyDriver

from fence.auth import logout, build_redirect_url
from fence.blueprints.data.indexd import S3IndexedFileLocation
from fence.blueprints.login.utils import allowed_login_redirects, domain
from fence.errors import UserError
from fence.jwt import keys
Expand Down Expand Up @@ -41,6 +42,8 @@

from cdislogging import get_logger

from cdispyutils.config import get_value

from gen3authz.client.arborist.client import ArboristClient

# Can't read config yet. Just set to debug for now, else no handlers.
Expand Down Expand Up @@ -166,6 +169,44 @@ def public_keys():
)


def _check_s3_buckets(app):
"""
Function to ensure that all s3_buckets have a valid credential.
Additionally, if there is no region it will produce a warning then trys to fetch and cache the region.
"""
buckets = config.get("S3_BUCKETS") or {}
aws_creds = config.get("AWS_CREDENTIALS") or {}

for bucket_name, bucket_details in buckets.items():
cred = bucket_details.get("cred")
region = bucket_details.get("region")
if not cred:
raise ValueError(
"No cred for S3_BUCKET: {}. cred is required.".format(bucket_name)
)
if cred not in aws_creds and cred != "*":
raise ValueError(
"Credential {} for S3_BUCKET {} is not defined in AWS_CREDENTIALS".format(
cred, bucket_name
)
)
if not region:
logger.warning(
"WARNING: no region for S3_BUCKET: {}. Providing the region will reduce"
" response time and avoid a call to GetBucketLocation which you make lack the AWS ACLs for.".format(
bucket_name
)
)
credential = S3IndexedFileLocation.get_credential_to_access_bucket(
bucket_name,
aws_creds,
config.get("MAX_PRESIGNED_URL_TTL", 3600),
app.boto,
)
region = app.boto.get_bucket_region(bucket_name, credential)
config["S3_BUCKETS"][bucket_name]["region"] = region


def app_config(
app, settings="fence.settings", root_dir=None, config_path=None, file_name=None
):
Expand Down Expand Up @@ -208,6 +249,8 @@ def app_config(

_setup_oidc_clients(app)

_check_s3_buckets(app)


def _setup_data_endpoint_and_boto(app):
if "AWS_CREDENTIALS" in config and len(config["AWS_CREDENTIALS"]) > 0:
Expand Down
23 changes: 17 additions & 6 deletions fence/blueprints/data/indexd.py
Original file line number Diff line number Diff line change
Expand Up @@ -491,13 +491,22 @@ class S3IndexedFileLocation(IndexedFileLocation):
"""

@classmethod
def assume_role(cls, bucket_cred, expires_in, aws_creds_config):
def assume_role(cls, bucket_cred, expires_in, aws_creds_config, boto=None):
"""
Args:
bucket_cred
expires_in
aws_creds_config
boto (optional): provide `boto` when calling this function
outside of application context, to avoid errors when
using `flask.current_app`.
"""
boto = boto or flask.current_app.boto

role_arn = get_value(
bucket_cred, "role-arn", InternalError("role-arn of that bucket is missing")
)
assumed_role = flask.current_app.boto.assume_role(
role_arn, expires_in, aws_creds_config
)
assumed_role = boto.assume_role(role_arn, expires_in, aws_creds_config)
cred = get_value(
assumed_role, "Credentials", InternalError("fail to assume role")
)
Expand Down Expand Up @@ -535,7 +544,9 @@ def bucket_name(self):
return None

@classmethod
def get_credential_to_access_bucket(cls, bucket_name, aws_creds, expires_in):
def get_credential_to_access_bucket(
cls, bucket_name, aws_creds, expires_in, boto=None
):
s3_buckets = get_value(
config, "S3_BUCKETS", InternalError("buckets not configured")
)
Expand Down Expand Up @@ -570,7 +581,7 @@ def get_credential_to_access_bucket(cls, bucket_name, aws_creds, expires_in):
InternalError("aws credential of that bucket is not found"),
)
return S3IndexedFileLocation.assume_role(
bucket_cred, expires_in, aws_creds_config
bucket_cred, expires_in, aws_creds_config, boto
)

def get_bucket_region(self):
Expand Down
13 changes: 10 additions & 3 deletions fence/blueprints/oauth2.py
Original file line number Diff line number Diff line change
Expand Up @@ -81,10 +81,17 @@ def authorize(*args, **kwargs):
raise UserError("idp {} is not supported".format(idp))
idp_url = IDP_URL_MAP[idp]
login_url = "{}/login/{}".format(config.get("BASE_URL"), idp_url)
if idp == "shibboleth":
shib_idp = flask.request.args.get("shib_idp")
if shib_idp:

# handle valid extra params for fence multi-tenant and shib login
fence_idp = flask.request.args.get("fence_idp")
shib_idp = flask.request.args.get("shib_idp")
if idp == "fence" and fence_idp:
params["idp"] = fence_idp
if fence_idp == "shibboleth":
params["shib_idp"] = shib_idp
elif idp == "shibboleth" and shib_idp:
params["shib_idp"] = shib_idp

login_url = add_params_to_uri(login_url, params)
return flask.redirect(login_url)

Expand Down
Loading

0 comments on commit 76c1918

Please sign in to comment.