Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Download individual Vault objects instead of complete <ExportItem> #273

Closed
jay-eleven opened this issue May 25, 2022 · 9 comments

Comments

@jay-eleven
Copy link
Collaborator

jay-eleven commented May 25, 2022

Hi Ross!

Some Vault exports are huuuuuuge. For example, one of my users had a 1.1T Drive Vault export:

jay@cloudshell:~$ gam print vaultexports matter myMatter fields stats.sizeInBytes,status | grep "user@domain.com" | cut -d, -f4-6 | numfmt --to=iec --field=2 -d, | sort
Getting all Vault Exports for Vault Matter: myMatter(xxxxxxx)
Got 5 Vault Exports for Vault Matter: myMatter(xxxxxxx)...
user@domain.com-vault-chat-mbox,8.1K,COMPLETED
user@domain.com-vault-chat-pst,16K,COMPLETED
user@domain.com-vault-drive,1.1T,COMPLETED
user@domain.com-vault-gmail-mbox,256M,COMPLETED
user@domain.com-vault-gmail-pst,279M,COMPLETED

This 1.1T export was formed by almost 90 zip files:

jay@cloudshell:~$ gam show vaultexports matter myMatter | grep "objectName" | grep "drive"
          objectName: xxxx/exportly-yyy/user@domain.com-vault-drive-custodian-docid.csv
          objectName: xxxx/exportly-yyy/user@domain.com-vault-drive-metadata.xml
          objectName: xxxx/exportly-yyy/user@domain.com-vault-drive_0.zip
          objectName: xxxx/exportly-yyy/user@domain.com-vault-drive_1.zip
[...]
          objectName: xxxx/exportly-yyy/user@domain.com-vault-drive_84.zip
          objectName: xxxx/exportly-yyy/user@domain.com-vault-drive_85.zip

So, unless I'm missing something, a command like gam download vaultexport <MatterItem> <ExportItem> for this user would require 1.1T free space in my local drive.

Turns out Vault generates .zip files that are ~10-15Gb size, so instead of downloading all of them at once it would be awesome if I could download one by one and not need a huge local drive with a ton of free space.

In order to accomplish this, several things need to happen.

  1. gam show vaultexports needs to display a cloudStorageSink.files.objectURI field formed concatenating bucketName with objectURI in order to form a valid Cloud Storage URI: gs://<bucketName>/<objectName>
  2. gam show vaultexports needs to be able to filter by cloudStorageSink.files.objectURI by allowing fields cloudStorageSink.files.objectURI
  3. gam download vaultexport command needs to be extended to support individual objectURIs. Something like gam download vaultexport <ExportItem> object <objectURI> matter <MatterItem>

Then something like this would be possible:

  1. Use gam redirect stdout vaultfiles.csv show vaultexports ee matter mm fields cloudStorageSink.files.objectURI to extract all URIs to a file
  2. Do some looping like:
while read FILE
do
    # Download one file
    gam download vaultexport ee object $FILE matter mm
    # Upload file to Drive
    gam user uu add drivefile localfile $FILE parentname pp
    # Delete file
    rm $FILE
done < vaultfiles.csv

Thoughts?

@taers232c
Copy link
Owner

Jay,

Swamped at the moment, I can get to this next week.

Ross

@taers232c
Copy link
Owner

@jay-eleven
Copy link
Collaborator Author

Wow!! I was not expecting such a fast turnaround. I'll test ASAP and report back.

Thanks Ross.

@jay-eleven
Copy link
Collaborator Author

I've tested this and it works flawlessly.

Thanks Ross!!

@taers232c
Copy link
Owner

taers232c commented May 31, 2022 via email

@jay-eleven
Copy link
Collaborator Author

jay-eleven commented May 31, 2022

I'm testing a shell script I wrote and so far it's humming happily. When it finishes running, I'll update documentation to show a working example.

@jay-eleven
Copy link
Collaborator Author

Wiki updated. Take a look, the working example might be a bit overkill and you might want to leave your pseudo code... 😅

@taers232c
Copy link
Owner

taers232c commented May 31, 2022 via email

@jay-eleven
Copy link
Collaborator Author

Neither .csv nor .xml refer in any way to the .zip files so it's impossible to derive information from them as to which particular .zip file to download. Some CSVs just contain one row (not even headers) with the number of exported elements. Some just have MessageIds and Gmail labels or Drive metadata. Honestly, I can't see how anybody would like to download just one file from Vault, you need all of them to be able to rebuild a Mailbox or a Drive folder.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants