Skip to content

Conversation

@maxhora
Copy link
Contributor

@maxhora maxhora commented Jan 31, 2020

iterative/PyDrive2#7 should be merged first

Introduces gdrive_service_account_email, gdrive_service_account_user_email and gdrive_service_account_p12_file_path DVC config params to support authorization with Google Service accounts.

Introduced config params are independent from client_id and client_secret config params

Update:

iterative/PyDrive2#11 should be merged first

Introduces gdrive_user_service_account DVC config param disabled by default.

@maxhora maxhora self-assigned this Jan 31, 2020
@codecov
Copy link

codecov bot commented Jan 31, 2020

Codecov Report

Merging #3269 into master will increase coverage by 0.00%.
The diff coverage is 89.47%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #3269   +/-   ##
=======================================
  Coverage   91.30%   91.30%           
=======================================
  Files         141      141           
  Lines        8582     8594   +12     
=======================================
+ Hits         7836     7847   +11     
- Misses        746      747    +1     
Impacted Files Coverage Δ
dvc/config.py 96.81% <ø> (ø)
dvc/remote/gdrive.py 86.36% <89.47%> (+0.30%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ea981ae...8441ca5. Read the comment docs.

@efiop
Copy link
Contributor

efiop commented Jan 31, 2020

For the record: PyDrive2 1.4.4 has been released.

@efiop
Copy link
Contributor

efiop commented Feb 4, 2020

@Maxris Any progress on this?

@maxhora
Copy link
Contributor Author

maxhora commented Feb 4, 2020

@efiop I believe, it should be evaluated/tested by @shcheklein
Also it is turned out that refresh token is not generated for Service accounts. Since PyDrive supports only old p12 keys to do Service account auth, Travis setup will require additional changes (bundling of p12 cert with non-default password; maybe limiting GDrive API scope only to drive.appdata).

@shcheklein
Copy link
Contributor

@Maxris @efiop we add a lot of code, but I'm not convinced it solves the testing. At least for now. It does not solve the limits issue, unless I'm missing something? What else am I missing?

@maxhora
Copy link
Contributor Author

maxhora commented Feb 4, 2020

@shcheklein yes, it seems it doesn't help to solve limits issues, since Service Account impersonate some another user in domain and the limits are the same.
These changes might be considered as "New functionality" to support login with Service Account. While it is not ready to be automated on Travis, it works perfectly locally.

@maxhora
Copy link
Contributor Author

maxhora commented Feb 4, 2020

These changes might be considered as "New functionality" to support login with Service Account. While it is not ready to be automated on Travis, it works perfectly locally.

Locally user specified path to p12 key and PyDrive automatically retrieves new access_token for Service Account after old one is expired. The difference to the auth for normal Google account is that in addition to an access_token a refresh_token is obtained, and PyDrive automatically retrieves a new access_token by using a refresh_token value.

@shcheklein
Copy link
Contributor

@Maxris not sure I understood the last part about refresh tokens and differences, could you please elaborate a bit?

These changes might be considered as "New functionality" to support login with Service Account. While it is not ready to be automated on Travis, it works perfectly locally.

what would the use case for this? automation? like in CI/CD or something?

While it is not ready to be automated on Travis, it works perfectly locally.

what are we missing here?

@maxhora
Copy link
Contributor Author

maxhora commented Feb 4, 2020

@shcheklein

what would the use case for this? automation? like in CI/CD or something?

Yes, use case is to run tests on Travis with Service Account.

what are we missing here?

p12 key with custom password to do auth with Service Account from Iterative's G Suite and secure way (probably, one another env var) to pass custom password into PyDrive.ServiceAuth()

@shcheklein
Copy link
Contributor

@Maxris do service accounts have their own GDrive space?

could you please elaborate on that difference in the token refresh mechanism, please? I still don't get it.

@maxhora
Copy link
Contributor Author

maxhora commented Feb 4, 2020

@shcheklein

@Maxris do service accounts have their own GDrive space?

No, Service account is a "resource" which has permissions to access other resources.

This PR was tested with following flow https://developers.google.com/identity/protocols/OAuth2ServiceAccount#delegatingauthority . As I understand, Service account receives access to any domain's data on behalf of any domain's user, but the access scope for this Service account can be limited to list of desired access scopes (for example, only https://www.googleapis.com/auth/drive ).

could you please elaborate on that difference in the token refresh mechanism, please? I still don't get it.

For usual Google drive user generated gdrive-user-credentials.json contains both access_token and refresh_token. Just content of gdrive-user-credentials.json is enough to have unlimited in time access to the user's Google Drive.

For Service account generated gdrive-user-credentials.json contains only access_token which is valid only for 1 hour. After access_token is expired, it is supposed that p12 key file is available around to ask Google API to generate new access_token.

btw, in that Google's doc they warn:

Always discourage developers from checking keys into the source code or leaving them in the Downloads directory of their workstation.

@shcheklein
Copy link
Contributor

@Maxris Ok, makes sense. I checked the page and it looks like there is an option to not involve user into this 2LO vs 3LO. Don't you think for automation and testing purposes it would be easier to use 2LO from that doc? I don't see a need to impersonate someone through a service account, at least it does not look superior to the regular service account. Unless I'm missing something again.

@maxhora
Copy link
Contributor Author

maxhora commented Feb 5, 2020

@Maxris Ok, makes sense. I checked the page and it looks like there is an option to not involve user into this 2LO vs 3LO. Don't you think for automation and testing purposes it would be easier to use 2LO from that doc? I don't see a need to impersonate someone through a service account, at least it does not look superior to the regular service account. Unless I'm missing something again.

Good point!
So far, I was not able to find any evidences that in case of 2LO a Service account can access Google Drive storage (Google Drive storage, seems, should belong to the Google Project, but not to specific user, but I was not able to find the technical way to achieve that so far)

Another even more important thing is that PyDrive expects all 3 following params to be available https://github.com/iterative/PyDrive2/blob/master/pydrive2/auth.py#L146
And if, for example, client_user_email (which is used to impersonate real user) is not provided, https://github.com/iterative/PyDrive2/blob/master/pydrive2/auth.py#L416 throws "Insufficient service config in settings". I afraid that PyDrive for the moment doesn't support 2LO since providing of client_user_email always expected.

@maxhora
Copy link
Contributor Author

maxhora commented Feb 10, 2020

Once iterative/PyDrive2#11 is merged, it should be sufficient to set secured GDRIVE_USER_CREDENTIALS_DATA env var in Travis settings for service account and specify service account email only.

@maxhora maxhora force-pushed the gdrive-service-account-support branch from 25919e4 to c209498 Compare February 12, 2020 22:04
@maxhora maxhora force-pushed the gdrive-service-account-support branch from c209498 to 38dcca7 Compare March 6, 2020 20:40
@shcheklein
Copy link
Contributor

@shcheklein I've rebased the branch to master already,
Since config doesn't have global constants anymore, I have defined self.GDRIVE_SERVICE_ACCOUNT_EMAIL locally to reference it twice from inside RemoteGDrive. Is it a restriction now to use constants to reference some key from config?

yeah, if you put them in the schema, no need to define upper case constants anymore, just use strings, check similar cases in the gdrive.py


GDRIVE_SERVICE_ACCOUNT_EMAIL = "gdrive_service_account_email"
GDRIVE_SERVICE_ACCOUNT_P12_FILE_PATH = "gdrive_service_account_user_email"
GDRIVE_SERVICE_ACCOUNT_USER_EMAIL = "gdrive_service_account_p12_file_path"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please, see my comment - no need to do separate constants

Copy link
Contributor

@shcheklein shcheklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great overall, check the only comment regarding the constants

@maxhora
Copy link
Contributor Author

maxhora commented Mar 7, 2020

@shcheklein this PR is not yet ready, need to setup Service Account and set Travis env var.

@maxhora maxhora changed the title [WIP] gdrive: support Google Service accounts gdrive: support Google Service accounts Mar 9, 2020
@maxhora
Copy link
Contributor Author

maxhora commented Mar 9, 2020

@shcheklein should be ready to be merged

@shcheklein
Copy link
Contributor

@Maxris travis failed, PTAL.

@maxhora maxhora force-pushed the gdrive-service-account-support branch from 2e22601 to 8441ca5 Compare March 10, 2020 07:34
@maxhora
Copy link
Contributor Author

maxhora commented Mar 10, 2020

@shcheklein thanks! Have fixed value of Travis env var and GDrive tests are passing now. There are still random failures of S3 tests (happen on all DVC Travis builds).

@efiop
Copy link
Contributor

efiop commented Mar 10, 2020

S3 issues fixed, restarted the tests for this PR. Let's see.

@shcheklein
Copy link
Contributor

@efiop I think should be good to merge! :)

@efiop efiop merged commit 3ff804b into master Mar 11, 2020
@efiop efiop deleted the gdrive-service-account-support branch March 11, 2020 13:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants