Skip to content

Commit

Permalink
Merge branch 'master' into fix/authz-signed-urls
Browse files Browse the repository at this point in the history
  • Loading branch information
Avantol13 committed Dec 6, 2019
2 parents fa2ddf3 + f5a712b commit a8b791a
Show file tree
Hide file tree
Showing 19 changed files with 515 additions and 125 deletions.
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# To run: docker run -d -v /path/to/fence-config.yaml:/var/www/fence/fence-config.yaml --name=fence -p 80:80 fence
# To check running container: docker exec -it fence /bin/bash

FROM quay.io/cdis/python-nginx:pybase3-1.0.0
FROM quay.io/cdis/python-nginx:pybase3-1.1.0

ENV appname=fence

Expand Down
93 changes: 93 additions & 0 deletions docs/fence_shibboleth.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
# Shibboleth / InCommon login

Shibboleth Single Sign-On and Federating Software is a standards based, open source software package for web single sign-on across or within organizational boundaries. The Shibboleth software implements widely used federated identity standards, principally the OASIS Security Assertion Markup Language (SAML), to provide a federated single sign-on and attribute exchange framework. It is made up of three components ([docs](https://wiki.shibboleth.net/confluence/display/CONCEPT/Home)):
- The Identity Provider (IdP) is responsible for user authentication and providing user information to the Service Provider (SP). It is located at the home organization, which is the organization which maintains the user's account.
- The Service Provider (SP) is responsible for protecting an online resource and consuming information from the Identity Provider (IdP). It is located at the resource organization.
- The Discovery Service (DS) helps the Service Provider (SP) discover the user's Identity Provider (IdP). It may be located anywhere on the web and is not required in all cases.

Shibboleth is part of the InCommon Trusted Access Platform, an IAM software suite that is packaged for easy installation and configuration. InCommon operates the identity management federation for U.S. research and education, and their sponsored partners. InCommon uses SAML-based authentication and authorization systems (such as Shibboleth) to enable scalable, trusted collaborations among its community of participants.

To enable InCommon login, Shibboleth must be set up in a multi-tenant Fence instance, which lets us log in through InCommon by specifying the `shib_idp` parameter (as of Fence release 4.7.0 and Fence-shib release 2.7.2). If no `shib_idp` is specified (or if using an earlier Fence version), users will be redirected to the NIH login page by default.

Note that in Fence, we use the terms "Shibboleth" and "InCommon" interchangeably.

## Login flow

The `/login/fence` endpoint (multi-tenant Fence login endpoint) accepts the query parameters `ipd` and `shib_idp`. If `idp` is set to `shibboleth`, Fence adds the `ipd` and `shib_idp` parameters to the authorization URL before redirecting the user.

The `/authorize` endpoint accepts the query parameters `ipd` and `shib_idp`. If `idp` is set to `shibboleth`, Fence adds the `shib_idp` parameter to the login URL before redirecting the user.

The `/login/shib` endpoint accepts the query parameter `shib_idp`. Fence checks this parameter to know which Shibboleth identity provider to use (by default, if no `shib_idp` is specified, NIH is used by default).

After the user logs in and is redirected to `/login/shib/login`, we get the `eppn` (EduPerson Principal Name) from the request headers to use as username. If the `eppn` is not available, we use the `persistent-id` instead.

![Shibboleth Login Flow](images/seq_diagrams/shibboleth_flow.png)

Notes about the NIH login implementation:
- NIH login is used as the default when the `idp` is fence and no `shib_idp` is specified (for backwards compatibility).
- NIH login requires special handling because it uses slightly different login endpoints than other InCommon providers.
- When a user logs into NIH with an eRA commons ID, only the `persistent-id` is returned. For other NIH logins, both `eppn` and `persistent-id` are returned. This is why when a user logs in through NIH, we use the `persistent-id` as the username even when the `eppn` is provided (for backwards compatibility).

## Configuration

### In the multi-tenant Fence instance

The [Shibboleth dockerfile](../DockerfileShib) image is at https://quay.io/repository/cdis/fence-shib and is NOT compatible yet with python 3/the latest Fence (for now, use Fence 2.7.x).

The deployment only includes `revproxy` and `fenceshib`. The Fence configuration enables the `shibboleth` provider:

```
OPENID_CONNECT:
shibboleth:
[...]
ENABLED_IDENTITY_PROVIDERS:
providers:
shibboleth:
name: Shibboleth Login
```

Note that because Fenceshib is not compatible with the latest Fence yet, we must use the deprecated `ENABLED_IDENTITY_PROVIDERS.providers` field instead of the newer `LOGIN_OPTIONS` section.

The Shibboleth configuration can be checked inside the Fenceshib pod under `/etc/shibboleth/`.

**Warning:** Shibboleth login does not work if there are more than one replica, or if logging in through a canary.

### In the Commons which is set up with InCommon login

Register an OIDC client using [this `fence-create` command](https://github.com/uc-cdis/fence#register-internal-oauth-client), the redirect url should be `<COMMONS_URL>/user/login/fence/login`.

The Fence configuration enables the `fence` provider (multi-tenant Fence setup) with the `shibboleth` provider (provider to be used by the multi-tenant Fence instance):
```
OPENID_CONNECT:
fence:
[...]
```

Setup example:
```
LOGIN_OPTIONS:
- name: 'NIH Login by default'
idp: fence
- name: 'NIH Login'
idp: fence
fence_idp: shibboleth
shib_idps:
- urn:mace:incommon:nih.gov
- name: 'UChicago Login'
idp: fence
fence_idp: shibboleth
shib_idps:
- urn:mace:incommon:uchicago.edu
- name: 'InCommon Login list'
idp: fence
fence_idp: shibboleth
shib_idps:
- urn:mace:incommon:nih.gov
- urn:mace:incommon:uchicago.edu
- name: 'InCommon Login all'
idp: fence
fence_idp: shibboleth
shib_idps: '*'
```

Several login options can use the same provider (`idp`). Each option that uses the `fence` provider and the `shibboleth` Fence provider (`fence_idp`) can specify one or more InCommon IDPs `shib_idps` in a list, _or_ the wildcard string `'*'` to enable all available InCommon IDPs (be careful not to omit the quotes when using the wildcard). If no `shib_idps` are specified, Fence will default to NIH login.
Binary file added docs/images/seq_diagrams/shibboleth_flow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/usersync.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
161 changes: 161 additions & 0 deletions docs/usersync.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,161 @@
# Usersync

Usersync is a script that parses user access information from multiple sources (user.yaml files, dbGaP user authorization telemetry files AKA whitelists) and keeps users' access to Gen3 resources up to date by updating the Fence and Arborist databases.

## Usersync flow

![Usersync Flow](images/usersync.png)

> Note that at the time of writing, the user.yaml file overrides the access obtained from the telemetry files. In the future, usersync will combine the access instead.
## Usersync result example

### Example of user.yaml file:

<details>
<summary>Expand user.yaml</summary>

```
# authz information follows the attribute-based access control (ABAC) model
authz:
resources:
- name: programs
subresources:
- name: myprogram
- name: phs1
- name: phs2
- name: phs3
- name: 'open'
policies:
- id: 'open_data_reader'
description: 'Read access to open data'
role_ids:
- 'reader'
- 'storage_reader'
resource_paths: ['/open']
- id: phs1_phs2_reader
description: "Read access to ph1 and ph2"
role_ids:
- reader
resource_paths:
- /programs/phs1
- /programs/phs2
- id: phs3_creator
description: "Create access to ph3"
role_ids:
- creator
resource_paths:
- /programs/phs3
- id: phs1_admin
description: "Admin access to ph1 indexd records"
role_ids:
- indexd_record_admin
resource_paths:
- /programs/phs1
roles:
- id: reader
description: ""
permissions:
- id: reader
action:
method: read
service: "*"
- id: storage_reader
description: ""
permissions:
- id: storage_reader
action:
method: read-storage
service: "*"
- id: creator
description: ""
permissions:
- id: creator
action:
method: create
service: "*"
- id: indexd_record_admin
description: ""
permissions:
- id: indexd_record_admin
action:
method: "*"
service: indexd
# policies automatically given to anyone, even if they are not authenticated
anonymous_policies:
- open_data_reader
# policies automatically given to authenticated users (in addition to their other policies)
all_users_policies: []
groups:
- name: phs1_phs2_readers
policies:
- phs1_phs2_reader
users:
- ABC
- DEF
# OIDC clients
clients:
client1:
policies:
- open_data_reader
users:
ABC:
# "admin" gives create/update/delete access to programs and projects in Sheepdog
admin: true
policies:
- phs1_admin
# "projects" is the deprecated way of providing access. We should now use "policies"
projects:
- auth_id: myprogram
privilege:
- read
- read-storage
- write-storage
GHI:
admin: false
policies:
- phs3_creator
```
</details>

### Example of telemetry file (CSV format):

```
user name, login, authority, role, email, phone, status, phsid, permission set, created
Mr. DEF,DEF,eRA,PI,def@com,"123-456-789",active,phs3.v2.p3.c4,"General Research Use",2013-03-19 12:32:12.600
Mrs. GHI,GHI,eRA,PI,ghi@com,"123-456-789",active,phs3.v2.p3.c4,"General Research Use",2013-03-19 12:32:12.600
```

Usersync gives users "read" and "read-storage" permissions to the dbGaP studies.

> Note: The dbGaP telemetry files contain consent codes that can be parsed by usersync: [more details here](dbgap_info.md). This simplified example does not include consent code parsing.
### Resulting access:

- user ABC:
- /open: read + read-storage
- /programs/myprogram: read, read-storage, write-storage
- /programs/phs1: read + all methods on indexd
- /programs/phs2: read
- user DEF:
- /open: read + read-storage
- /programs/phs1: read
- /programs/phs2: read
- /programs/phs3: read + read-storage _(from the telemetry file)_
- user GHI:
- /programs/phs3: create _(user.yaml access overrides telemetry file access)_

## Validation

The [gen3users CLI](https://github.com/uc-cdis/gen3users) includes a user.yaml validation tool:
```
pip install gen3users
gen3users validate user.yaml
```
10 changes: 7 additions & 3 deletions fence/blueprints/login/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ def default_login():
The default root login route.
"""
# default login option
if "DEFAULT_LOGIN_IDP" in config:
if config.get("DEFAULT_LOGIN_IDP"):
default_idp = config["DEFAULT_LOGIN_IDP"]
elif "default" in config.get("ENABLED_IDENTITY_PROVIDERS", {}):
# fall back on ENABLED_IDENTITY_PROVIDERS.default
Expand Down Expand Up @@ -219,15 +219,19 @@ def provider_info(login_details):
provider_info(login_details) for login_details in login_options
]
except KeyError as e:
raise InternalError("login options misconfigured: {}".format(e))
raise InternalError("LOGIN_OPTIONS misconfigured: {}".format(e))

# if several login_options are defined for this default IDP, will
# default to the first one:
default_provider_info = next(
(info for info in all_provider_info if info["idp"] == default_idp), None
)
if not default_provider_info:
raise InternalError("default provider misconfigured")
raise InternalError(
"default provider misconfigured: DEFAULT_LOGIN_IDP is set to {}, which is not configured in LOGIN_OPTIONS".format(
default_idp
)
)

return flask.jsonify(
{"default_provider": default_provider_info, "providers": all_provider_info}
Expand Down
10 changes: 7 additions & 3 deletions fence/blueprints/login/shib.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ def get(self):
# https://wiki.shibboleth.net/confluence/display/SP3/SSO
entityID = flask.request.args.get("shib_idp")
flask.session["entityID"] = entityID
# TODO: use OPENID_CONNECT.shibboleth.redirect_url instead of hardcoded
actual_redirect = config["BASE_URL"] + "/login/shib/login"
if not entityID or entityID == "urn:mace:incommon:nih.gov":
# default to SSO_URL from the config which should be NIH login
Expand All @@ -56,7 +57,10 @@ def get(self):

# eppn stands for eduPersonPrincipalName
username = flask.request.headers.get("eppn")
if not username:
entityID = flask.session.get("entityID")

# if eppn not available or logging in through NIH
if not username or not entityID or entityID == "urn:mace:incommon:nih.gov":
persistent_id = flask.request.headers.get(shib_header)
username = persistent_id.split("!")[-1] if persistent_id else None
if not username:
Expand All @@ -67,8 +71,8 @@ def get(self):
raise Unauthorized("Unable to retrieve username")

idp = IdentityProvider.itrust
if flask.session.get("entityID"):
idp = flask.session.get("entityID")
if entityID:
idp = entityID
login_user(flask.request, username, idp)

if flask.session.get("redirect"):
Expand Down
2 changes: 1 addition & 1 deletion fence/blueprints/login/synapse.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ def __init__(self):

def post_login(self, user, token_result):
user.id_from_idp = token_result["sub"]
user.email = token_result["email_verified"]
user.email = token_result["email"]
user.display_name = "{given_name} {family_name}".format(**token_result)
info = {}
if user.additional_info is not None:
Expand Down
8 changes: 6 additions & 2 deletions fence/config-default.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -165,6 +165,10 @@ OPENID_CONNECT:
# WARNING: DO NOT ENABLE IN PRODUCTION (for testing purposes only)
mock: false
mock_default_user: 'test@example.com'
shibboleth:
client_id: ''
client_secret: ''
redirect_url: '{{BASE_URL}}/login/shib/login'

# these are the *possible* scopes a client can be given, NOT scopes that are
# given to all clients. You can be more restrictive during client creation
Expand Down Expand Up @@ -255,7 +259,7 @@ LOGIN_OPTIONS: [] # !!! remove the empty list to enable login options!
# - Google? Use: '{{BASE_URL}}/login/google'
# - Multi-tenant fence (e.g. another fence instance)? Use: '{{BASE_URL}}/login/fence'
# - Sibboleth? Use: '{{BASE_URL}}/login/shib'
DEFAULT_LOGIN_IDP: google
DEFAULT_LOGIN_IDP: null
DEFAULT_LOGIN_URL: '{{BASE_URL}}/login/google'

# `LOGIN_REDIRECT_WHITELIST` is a list of extra whitelisted URLs which can be redirected
Expand Down Expand Up @@ -710,6 +714,6 @@ ALLOWED_USER_SERVICE_ACCOUNT_DOMAINS:
# seconds if the team matches.
DREAM_CHALLENGE_TEAM: 'DREAM'
DREAM_CHALLENGE_GROUP: 'DREAM'
SYNAPSE_URI: 'https://repo-prod.prod.sagebase.org/auth/v1/'
SYNAPSE_URI: 'https://repo-prod.prod.sagebase.org/auth/v1'
SYNAPSE_DISCOVERY_URL:
SYNAPSE_AUTHZ_TTL: 86400
Loading

0 comments on commit a8b791a

Please sign in to comment.