-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(PXP:9493):(feat)Non-dbgap whitelist in usersync #1020
Conversation
…ustom-data-access
The style in this PR agrees with This formatting comment was generated automatically by a script in uc-cdis/wool. |
Pull Request Test Coverage Report for Build 12645
💛 - Coveralls |
fence/config-default.yaml
Outdated
@@ -904,6 +904,8 @@ SERVICE_ACCOUNT_LIMIT: 6 | |||
GLOBAL_PARSE_VISAS_ON_LOGIN: | |||
# Settings for usersync with visas | |||
USERSYNC: | |||
# allowed list of custom project ids that can be processed by usersync | |||
allowed_custom_project_ids: ["CUSTOM-*"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this necessary? I don't see where this is used
fence/sync/sync_users.py
Outdated
@@ -509,7 +523,7 @@ def _parse_csv(self, file_dict, sess, dbgap_config={}, encrypted=True): | |||
continue | |||
|
|||
phsid_privileges = {} | |||
phsid = row.get("phsid", "").split(".") | |||
phsid = row.get("phsid", row.get("project_id", "")).split(".") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the change here? does that row exist in dbgap files? If so this is potentially a change in behavior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh I see the new file might have a different row name. This should be configuration
tests/test-fence-config.yaml
Outdated
@@ -341,6 +341,8 @@ dbGaP: | |||
proxy_user: '' | |||
protocol: 'sftp' | |||
decrypt_key: '' | |||
allow_non_dbGaP_whitelist: false | |||
allowed_id_patterns: ['PROJECT-12345'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is somewhat confusingly named, since this is for file matching. I think this should be a full regex to match the filename and default it to ['authentication_file_phs(\d{6})', 'authentication_file_PROJECT-*']'
or something. There's too much custom logic in the code for someone to know by looking at the configuration how to set this up or what it means. Probably deserves a comment about it as well
tests/test-fence-config.yaml
Outdated
@@ -368,6 +370,8 @@ dbGaP: | |||
'phs000178': ['/orgA/', '/orgB/', '/'] | |||
# study when parsing consent codes | |||
'phs000178.c2': ['/orgA/', '/orgB/', '/'] | |||
# non dbgap study: | |||
'PROJECT-12345': [''] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this is optional?
pattern += ".enc" | ||
pattern += "$" | ||
if re.match(pattern, os.path.basename(filepath)): | ||
return True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this intended to return early on the first matched pattern? Or should this only return True if ALL patterns match that are passed in?
fence/sync/sync_users.py
Outdated
@@ -527,7 +543,11 @@ def _parse_csv(self, file_dict, sess, dbgap_config={}, encrypted=True): | |||
continue | |||
|
|||
phsid_privileges = {} | |||
phsid = row.get("phsid", "").split(".") | |||
if self.non_dbGaP_whitelist: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should you check allow_non_dbgap_whitelist
?
tests/dbgap_sync/test_user_sync.py
Outdated
) | ||
monkeypatch.setattr(syncer, "non_dbGaP_whitelist", allow_non_dbgap_whitelist) | ||
# merge sftp server containing both dbgap and non-dbgap whitelists | ||
monkeypatch.setattr(syncer, "is_sync_from_dbgap_server", allow_non_dbgap_whitelist) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this used?
tests/dbgap_sync/test_user_sync.py
Outdated
@@ -633,7 +711,7 @@ def mock_merge(dbgap_servers, sess): | |||
|
|||
# this function will be called once for each sftp server | |||
# the test config file has 2 dbgap sftp servers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# the test config file has 2 dbgap sftp servers | |
# the test config file has 3 dbgap sftp servers |
tests/test-fence-config.yaml
Outdated
@@ -662,6 +676,10 @@ EXPIRED_AUTHZ_REMOVAL_JOB_FREQ_IN_SECONDS: 1 | |||
GLOBAL_PARSE_VISAS_ON_LOGIN: | |||
# Settings for usersync with visas | |||
USERSYNC: | |||
non_dbgap_whitelist_enabled: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why have both this and the per-sftp config allow_non_dbGaP_whitelist
?
tests/test-fence-config.yaml
Outdated
@@ -662,6 +676,9 @@ EXPIRED_AUTHZ_REMOVAL_JOB_FREQ_IN_SECONDS: 1 | |||
GLOBAL_PARSE_VISAS_ON_LOGIN: | |||
# Settings for usersync with visas | |||
USERSYNC: | |||
sync_from_visas: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these were removed since our merge of feat/passport, can you double check we actually need this and the one below?
tests/dbgap_sync/test_user_sync.py
Outdated
@pytest.mark.parametrize("syncer", ["google", "cleversafe"], indirect=True) | ||
@pytest.mark.parametrize("parse_consent_code_config", [False, True]) | ||
@pytest.mark.parametrize("parse_consent_code_config", [True, False]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this just to be consistently passing true then false or was there some real issue when it was ordered the other way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh woops. I was messing around with it and reversed the order. I'll change it back.
…fence into feat/custom-data-access
fence/sync/sync_users.py
Outdated
pattern += "$" | ||
pattern = pattern.encode().decode( | ||
"unicode_escape" | ||
) # when converting from yaml, python reads it as Python string literal. So "\" turns into "\\" which messes with the regex match |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you move this above the line, I generally try to avoid inline comments unless absolutely necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, why is there YAML at all now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh should I keep the comment and instead of "yaml", call it fence-config or should I just completely get rid of the comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the comment is helpful, let's clarify that it's YAML from the fence-config
@@ -527,9 +542,13 @@ def _parse_csv(self, file_dict, sess, dbgap_config={}, encrypted=True): | |||
continue | |||
|
|||
phsid_privileges = {} | |||
phsid = row.get("phsid", "").split(".") | |||
if dbgap_config.get("allow_non_dbGaP_whitelist", False): | |||
phsid = row.get("phsid", row.get("project_id", "")).split(".") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be configurable? Or I guess we can just require that people either use phsid
or project_id
, but this feels like something that could be a cfg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the way usersync is written right now makes ait a little difficult for us to have a per sftp server for picking phsid
and project_id
. We could include it in the next design for usersync that i want to do for fix-it-Friday though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah that's fine
def _merge_multiple_local_csv_files( | ||
self, dbgap_file_list, encrypted, dbgap_configs, session | ||
): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe a quick docstring here?
…ustom-data-access
New Features