Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to use with Google Cloud Storage? #214

Open
stefangweichinger opened this issue Mar 20, 2023 · 79 comments
Open

how to use with Google Cloud Storage? #214

stefangweichinger opened this issue Mar 20, 2023 · 79 comments

Comments

@stefangweichinger
Copy link
Contributor

Is there any documentation or howto for using a Google Cloud Storage bucket?
I was once told that this is possible but never figured out the actual config.

Does anyone use that?

@prajwaltr93
Copy link
Contributor

hey,

i believe you can back up to google cloud storage, i did a quick check using a minimal configuration like below.

org "MyConfig"
infofile "/amanda/state/curinfo"
logdir "/amanda/state/log"
indexdir "/amanda/state/index"
dumpuser "amandabackup"

amrecover_changer "changer"

define dumptype simple-gnutar-local {
    auth "local"
    compress none
    program "GNUTAR"
}

device_property "S3_HOST" "commondatastorage.googleapis.com"          
device_property "S3_ACCESS_KEY" "<access_key>"                # Your S3 Access Key
device_property "S3_SECRET_KEY" "<secret_key>"  # Your S3 Secret Key
device_property "S3_SSL" "NO"      # you can enable this if you have CA certs. 
tpchanger "chg-multi:s3:<bucket_name>/<folder_name>/<slot-1" # Number of tapes(volumes) 
changerfile  "s3-statefile"                                         
tapetype S3

define tapetype S3 {
    comment "S3 Bucket"
    length 10240 gigabytes # Bucket size 10TB
}

manually labelling volume

amlabel MyConfig MyConfig-1 slot 1

and amdump should go through.

@stefangweichinger
Copy link
Contributor Author

@prajwaltr93 thanks a lot, sounds good, and I will test asap.
A quick look let's me ask:

I think I don't have access key and secret key.

My service account key file looks like:

{
  "type": "service_account",
  "project_id": "myproject",
  "private_key_id": "7c82cxxxx",
  "private_key": "-----BEGIN PRIVATE KEY-----\nMIIEvgIBADAgFxxxxxxx\nln-----END PRIVATE KEY-----\n",
  "client_email": "some@my.tld",
  "client_id": "someid",
  "auth_uri": "https://accounts.google.com/o/oauth2/auth",
  "token_uri": "https://oauth2.googleapis.com/token",
  "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
  "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/bla.iam.gserviceaccount.com"
}

I don't see how to set the device_property values using that. Can you help here?

Maybe I need a different kind of key(s) from upstream?

@stefangweichinger
Copy link
Contributor Author

I think I need something like this: https://docs.simplebackups.com/storage-providers/ghsg5GE15AMwMo1qFjUCXn/google-cloud-storage-s3-compatible-credentials/8ZKUSSJRJxA4mU4VdxvRfo

Asked the responsible person to generate me those keys.

@prajwaltr93
Copy link
Contributor

Using access_key and secret_key should be the most straightforward approach. but a quick look at code at s3.h reveals that amanda does support other authentication methods that uses refresh_token, client_id and client_secret to fetch access_token which is then used to perform actual request if STORAGE_API is OAUTH2 i.e can be specified like
device_property "STORAGE_API" "OAUTH2" reference here .

attempt to fetch access token

we should be able to set these as device_property, referring here

i don't have these kinds of keys, so can't really test this hypothesis, but i hope this helps.

@stefangweichinger
Copy link
Contributor Author

@prajwaltr93 thanks for investigating. Interesting, but not yet 100% matching, I will see if I can figure out something.

@stefangweichinger
Copy link
Contributor Author

I think I was able to create the keys via CLI by doing "gcloud auth application-default login", I now have a json like:

 /root/.config/gcloud/application_default_credentials.json 
{
  "client_id": "xxxx",
  "client_secret": "yyyyy",
  "refresh_token": "zzzzz",
  "type": "authorized_user"
}

Now I try to configure a third "storage" in my amanda.conf using a changer like:

define changer cloud {
        tpchanger "chg-multi:s3:gs://someprefix_backup-daily/demotapes/"
        device_property "CLIENT_ID" "xxxx"
        device_property "CLIENT_SECRET" "yyyyy"
        device_property "PROJECT_ID" "ssssss"
        device_property "REFRESH_TOKEN" "yyyyyy"
        device_property "S3_HOST" "commondatastorage.googleapis.com"
        device_property "STORAGE_API" "OAUTH2"
        #device_property "VERBOSE" "true"
        changerfile  "s3-statefile"
}

My bucket is named like this: "gs://someprefix_backup-daily/". And I wonder how to configure that, currently I get replies like:

slot 1: While creating new S3 bucket: The specified bucket is not valid.: Invalid bucket name: 'gs:' (InvalidBucketName) (HTTP 400)
all slots have been loaded

If I remove the "gs://" amanda creates a local subdir ... not useful ...

So it seems I am close ... thanks @prajwaltr93

@stefangweichinger
Copy link
Contributor Author

my latest try is

tpchanger "chg-multi:s3:project_id-sb/amanda_vtapes/slot-{1..10}"

Still no success, amcheck simply times out now.

I start with the buckets returned by:

# gcloud storage ls
gs://project_id-sb/
gs://project_id-sb_backup-daily/

I don't know where that "-sb" comes from ... might come from the admin creating it for us.

More tomorrow.

@stefangweichinger
Copy link
Contributor Author

Current error msg with amcheck: "While creating new S3 bucket: Unknown S3 error (None) (HTTP 400)"

@prajwaltr93
Copy link
Contributor

my latest try is

tpchanger "chg-multi:s3:project_id-sb/amanda_vtapes/slot-{1..10}"

Still no success, amcheck simply times out now.

I start with the buckets returned by:

# gcloud storage ls
gs://project_id-sb/
gs://project_id-sb_backup-daily/

I don't know where that "-sb" comes from ... might come from the admin creating it for us.

More tomorrow.

from what i know bucket names and '-' don't go well together. '_' shouldn't be a problem

@prajwaltr93
Copy link
Contributor

Current error msg with amcheck: "While creating new S3 bucket: Unknown S3 error (None) (HTTP 400)"

configurations you are trying with seems right i think, not sure what exactly is causing this issue. but i will be getting my hands on different types of auth credentials apart from access_key and secret_key like client_id, client_secret etc. will be adding any findings here if i make any breakthrough. thank you for posting your findings here.

@stefangweichinger
Copy link
Contributor Author

I checked the debug logs right now and find:

Wed Mar 22 12:52:20.787223120 2023: pid 2660724: thd-0x558bc8e79e00: amcheck-device: Connection #0 to host (nil) left intact
Wed Mar 22 12:52:20.787237979 2023: pid 2660724: thd-0x558bc8e79e00: amcheck-device: data in 91: {
  "error": "invalid_grant",
  "error_description": "Token has been expired or revoked."
}

Maybe I have to use new credentials, maybe the permissions on the buckets aren't enough (very likely, I already filed a ticket).

@prajwaltr93
Copy link
Contributor

i think as a quick test to see if creds work would be to perform following curl request:

curl -d "cliend-id=x&client_secret=y&refresh_token=z&grant_type=refresh_token" -X POST https://accounts.google.com/o/oauth2/token

@stefangweichinger
Copy link
Contributor Author

tried your command, got "invalid grant". Reran my stuff with "gcloud auth application-default login", that lead me to some "allow Google Auth library" stuff in the browser and some magic connection to my personal google account. I don't understand this fully, that's why I had disabled that again 2 days ago.

Now the test command succeeds, at least I think so.

Used the new credentials in amanda.conf.

amcheck-device.debug looks different now, amcheck never succeeds, though.

I think it tries to create bucket(s) and fails ... I will see what I can quote here without publishing secrets.

@stefangweichinger
Copy link
Contributor Author

No success.

I have:

define changer cloud {
	tpchanger "chg-multi:s3:mybackup-prod-sb/vtapes/slot-{1..9}"
	device_property "CLIENT_ID" "xxxx"
	device_property "CLIENT_SECRET" "yyyy"
	#device_property "CREATE_BUCKET" "YES"
	device_property "MAX_RECV_SPEED" "1000000" # bytes per second
	device_property "MAX_SEND_SPEED" "1000000" # bytes per second
	#device_property "NB_THREADS_BACKUP" "4" # threads
	device_property "PROJECT_ID" "mybackup-prod"
	device_property "REFRESH_TOKEN" "zzzzzz"
	device_property "S3_HOST" "commondatastorage.googleapis.com"
	#device-property "S3_MULTI_PART_UPLOAD" "YES"
  	device-property "S3_SSL" "YES"
	device_property "STORAGE_API" "OAUTH2"
	device_property "VERBOSE" "true"
	changerfile  "s3-statefile"
}

define storage cloud {
      tpchanger "cloud"
      LABELSTR            "cloud-[0-9][0-9]*"
      autolabel "cloud-%" any
      TAPEPOOL            "$r"
      RUNTAPES            1
      TAPETYPE            "S3"
      #DUMP-SELECTION ALL FULL
}

I am not able to label a tape, amcheck simply never finishes. Tried different paths etc, no success.
With the original service account I can sync directories and files to the bucket.

One thought is that my account might lack the permission to create new buckets inside the one "parent bucket". I don't know enough about S3 storage to tell that.

@prajwaltr93
Copy link
Contributor

hey,

i noticed issue with code handling fetching access_token , currently testing changes. will let you know if it fixes this issue.

Thanks.

@stefangweichinger
Copy link
Contributor Author

i noticed issue with code handling fetching access_token , currently testing changes. will let you know if it fixes this issue.

Sounds promising, looking forward to any news here.

@stefangweichinger
Copy link
Contributor Author

@prajwaltr93 Seen your commit. I would have to recompile amanda to apply that. Does it already make sense to try that or do you have other changes planned as well?

@stefangweichinger
Copy link
Contributor Author

preparing my patched gentoo-ebuild already

@stefangweichinger
Copy link
Contributor Author

Did a test, no changed behavior so far. Waiting for the upstream admin to check my S3-credentials etc

@prajwaltr93
Copy link
Contributor

hey sorry for the delayed response, yeah as specified in the MR description amanda had trouble reading access_token, that got fixed but further request had issues, i thought it was something to do with latest curl library installed on my machine. so was ruling that out. let me see if it fixes that.

@stefangweichinger
Copy link
Contributor Author

No problem with the delay, glad you work on that issue. Yes, somewhere I also read that a downgrade of curl helped with accessing S3 (but I can't quote the exact link now).

@prajwaltr93
Copy link
Contributor

turns out google cloud storage does not support HTTP/2 yet, unless configured to use HTTP/2. so added code to use HTTP1.1. now request goes through but returns 400 Bad request, found that content-length header was not accurate, so request was failing with

Tue Mar 28 08:44:28.471761992 2023: pid 31620: thd-0x55ee22d60400: amlabel: data in 1555: <!DOCTYPE html>
<html lang=en>
  <meta charset=utf-8>
  <meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width">
  <title>Error 400 (Bad Request)!!1</title>
  <style>
    *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}
  </style>
  <a href=//www.google.com/><span id=logo aria-label=Google></span></a>
  <p><b>400.</b> <ins>That’s an error.</ins>
  <p>Your client has issued a malformed or illegal request.  <ins>That’s all we know.</ins>

Tue Mar 28 08:44:28.471794354 2023: pid 31620: thd-0x55ee22d60400: amlabel: PUT https://commondatastorage.googleapis.com/gbackupsprajwal1/DailySet1%2Fslot-1special-tapestart failed with 400/None

need to see what is causing this, looks to me that this is not a straight forward fix. need to investigate further.

@stefangweichinger
Copy link
Contributor Author

@prajwaltr93 thanks for investigating further. Sounds like a difficult task.
I was pointed at "gcs-fuse" instead: mount bucket via FUSE, use it like a normal storage for vtapes. I might try that also.
Still waiting for more feedback of the responsible admin (assigning me the creds etc).

@prajwaltr93
Copy link
Contributor

sure, also access_key and secret_key seems to work from my testing, so if you get hands on those, it should do. will be looking into making OAUTH2 work meanwhile.

@stefangweichinger
Copy link
Contributor Author

sure, also access_key and secret_key seems to work from my testing, so if you get hands on those, it should do. will be looking into making OAUTH2 work meanwhile.

I only have the service account key file as mentioned. I don't know if I can generate access_key and secret_key from that. I will research if I find the time.

@stefangweichinger
Copy link
Contributor Author

here the issue with the curl downgrade: #213 (comment)

@prajwaltr93
Copy link
Contributor

hi,

we will be actively working on this in coming weeks as we have allocated time for this fix, will merge and notify when that happens.

hope this helps,
Thanks.

@stefangweichinger
Copy link
Contributor Author

Great to hear, looking forward to a fix and a working setup.

@stefangweichinger
Copy link
Contributor Author

@prajwaltr93 any progress already?

@stefangweichinger
Copy link
Contributor Author

Why "AWS4", when it's Google Cloud Storage in my case?

AWS4 here stands for AWS Signature Version 4 is a standard for authentication, its common across major cloud storage providers.

Can't read label: While trying to read tapestart header:

could this be due to use of latest curl library ? reverting back might help

downgrading curl now for a test. 7.88.1-r2 ... I'll see in a minute.

@stefangweichinger
Copy link
Contributor Author

Access ID: GOOG1...

even the access_key i have access to has this prefix GOOG1, meaning they are the same, interchangeable terms likely

I was looking for how to generate these keys with the CLI-tools for Google Cloud. I am not sure yet.

@stefangweichinger
Copy link
Contributor Author

curl-7.88.1-r2 doesn't make a difference :-(

The request signature we calculated does not match the signature you provided. Check your Google secret key and signing method.

@stefangweichinger
Copy link
Contributor Author

Same with curl-7.87.0-r2

@stefangweichinger
Copy link
Contributor Author

Tried my second HMAC-Keypair. Same errors.

@stefangweichinger
Copy link
Contributor Author

stefangweichinger commented Jun 1, 2023

More context: this is when I run amcheck.

slot 1: Can't read label: While trying to read tapestart header: The request signature we calculated does not match the signature you provided. Check your Google secret key and signing method. (SignatureDoesNotMatch) (HTTP 403)

This is only with "STORAGE_API" "AWS4".

With "STORAGE_API" "OAUTH2" I get "Missing client_id properties", when I comment out "STORAGE_API", I see:

$ amcheck abt -o storage=cloud
Amanda Tape Server Host Check
-----------------------------
NOTE: Holding disk '/mnt/amhold/abt': 363 GB disk space available, using 363 GB
slot 7: Can't read label: Amanda header not found -- unlabeled volume?
slot 8: Can't read label: Amanda header not found -- unlabeled volume?

$ amlabel abt -o storage=cloud cloud_01 slot 1
Reading label...
Found an empty tape.
Writing label 'cloud_01'...
Error writing label: While writing amanda header: Too many retries; last message was 'S3 Error: Unknown (empty response body)' (None) (CURLcode 92) (HTTP 400) (after 14 retries).
Error writing label: While writing amanda header: Too many retries; last message was 'S3 Error: Unknown (empty response body)' (None) (CURLcode 92) (HTTP 400) (after 14 retries).

@stefangweichinger
Copy link
Contributor Author

I added

device_property visible "S3_BUCKET_LOCATION"    "europe-west3" # defaults to us-east-1

because the bucket is located there. I assume that's important, but it didn't yet fix things.

@prajwaltr93
Copy link
Contributor

prajwaltr93 commented Jun 1, 2023

i have curl 7.29, which is very old, i will check and see if upgrading to any other version breaks mine.

also found this : #137 (comment)

content-length header was not accurate

it is similar to findings i had when using latest CURL library,

@prajwaltr93
Copy link
Contributor

prajwaltr93 commented Jun 1, 2023

i faced similar issues on my Debian 11 WSL

amandabackup@workstation:/etc/amanda$ amlabel MyConfig MyConfig-1 slot 1
'/etc/amanda/MyConfig/amanda.conf', line 8: warning: Global changerfile is deprecated, it must be set in the changer section
Reading label...
Error reading volume label: While creating new S3 bucket: The request signature we calculated does not match the signature you provided. Check your Google secret key and signing method. (SignatureDoesNotMatch) (HTTP 403).
Not writing label.
Not writing label.
amandabackup@BSL-BNG-L591:/etc/amanda$ apt list --installed | grep libcurl4

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

libcurl4-nss-dev/stable,stable-security,now 7.74.0-1.3+deb11u7 amd64 [installed,automatic]
libcurl4/now 7.74.0-1.3+deb11u3 amd64 [installed,upgradable to: 7.74.0-1.3+deb11u7]
amandabackup@BSL-BNG-L591:/etc/amanda$

so its narrowing down to libcurl library

@stefangweichinger
Copy link
Contributor Author

Oh, interesting! I thought I was crazy ;-)
I wonder how I can downgrade even more on that gentoo server.
Additionally: do I have to recompile amanda (does it include these libraries) or is it enough to have the binary or/and libs in place?

@stefangweichinger
Copy link
Contributor Author

trying to compile older versions now. 7.81 didn't help. 7.79 neither. But I have to recompile amanda, that's what I missed here. STILL the same errors after compiling amanda against libcurl-7.79 (at least I assume so, maybe I have to recheck things and cleanup some things. This was just a quick first shot)

@stefangweichinger
Copy link
Contributor Author

No success so far. Went back to curl-7.79 (installed manually), re-compiled amanda etc
I think I will try it on a debian server asap. The gentoo-environment will be removed anyway sooner or later.

@stefangweichinger
Copy link
Contributor Author

It might be easier for me to compile amanda-3.6. In #213 (comment) it was mentioned that 3.6 doesn't have that issue. Pls tell me which branch to use.

@prajwaltr93
Copy link
Contributor

since you have access_key and secret_key i would suggest 3_6, since this PR branch is based off of 3_5.

@prajwaltr93
Copy link
Contributor

do I have to recompile amanda (does it include these libraries) or is it enough to have the binary or/and libs in place?

amanda is likely built based on curl library available in /usr/lib/curl/ so i would suggest recompiling.

@stefangweichinger
Copy link
Contributor Author

since you have access_key and secret_key i would suggest 3_6, since this PR branch is based off of 3_5.

Thanks. This leads to new problems, autogen fails determining the platform ... all this leads way too far already.

@stefangweichinger
Copy link
Contributor Author

went back to curl-7.63 without success

@stefangweichinger
Copy link
Contributor Author

I'd love to see a patched version with curl enabled, that brings a working libcurl with it ...
Gentoo in theory gives me plenty of options to fine tune the compilation but on the other hand I don't even know which curl really works and I am rather alone as amanda user on gentoo (plus using S3).

I currently have to fix the old non-S3 backup installation and feel a bit frustrated here. Basically I have a backup of a 3.5.1-release with buggy libcurl I should be able to roll back to (and that currently also doesn't work fully).

Any ideas? Shouldn't there be a patch for curl maybe? I think of patching a current curl-release or so.

@stefangweichinger
Copy link
Contributor Author

I will set up a new amanda installation on a brand new Debian11 machine and retry things there.

curl-7.74 there ...

@stefangweichinger
Copy link
Contributor Author

opened bug at gentoo as well: https://bugs.gentoo.org/907685

@prajwaltr93
Copy link
Contributor

Any ideas? Shouldn't there be a patch for curl maybe? I think of patching a current curl-release or so.

we have ticket tracking this exact issue since it was reported. fixing amanda to work with latest curl library would be great, haven't gotten into it yet. will update if any progress is made,

@stefangweichinger
Copy link
Contributor Author

@prajwaltr93 Why does 3_6 work then? I might try to use that if possible. I could try to come up with a working ebuild for 3_6 and/or build packages for Debian 11. For sure I'd appreciate if you could provide these debian packages also. I contacted the Debian maintainer for amanda also and asked if he already has packages for 3_6.

@prajwaltr93
Copy link
Contributor

prajwaltr93 commented Jun 5, 2023

but for debain there should be a folder called debian under packaging, you should be able to install dependencies using the control file, and get a .deb installer by running ./packaging/deb/buildpkg, ./packaging/deb/buildpkg server if only server binaries are required.

you can build deb packages yourself from instructions above, this is how i usually build and install amanda for debian machines.

Why does 3_6 work then?

i was referring to the issue tracker myself, but have to investigate.

@stefangweichinger
Copy link
Contributor Author

but for debain there should be a folder called debian under packaging, you should be able to install dependencies using the control file, and get a .deb installer by running ./packaging/deb/buildpkg, ./packaging/deb/buildpkg server if only server binaries are required.

you can build deb packages yourself from instructions above, this is how i usually build and install amanda for debian machines.

Sure. It would also be great if the amanda project would provide packages for the stable releases at least (3.5.3 anyone?). I think of build pipelines, github actions etc ... just mentioning ... why aren't these things used?

-> A set of packages which are known to work and can be used/tested by multiple people.

Building for gentoo is a different issue, I understand ... the current packaging subdir doesn't cover that as far as I see.

Why does 3_6 work then?

i was referring to the issue tracker myself, but have to investigate.

Thank you.

@prajwaltr93
Copy link
Contributor

Sure. It would also be great if the amanda project would provide packages for the stable releases at least (3.5.3 anyone?).

you can get 3.5.3 packages here

I think of build pipelines, github actions etc ... just mentioning ... why aren't these things used?

that would be a great add, let me check with my team if we can do that.

@stefangweichinger
Copy link
Contributor Author

Sure. It would also be great if the amanda project would provide packages for the stable releases at least (3.5.3 anyone?).

you can get 3.5.3 packages here

Really? Where?

I think of build pipelines, github actions etc ... just mentioning ... why aren't these things used?

that would be a great add, let me check with my team if we can do that.

Sure, go for it. I think it would be great to have a pipeline that builds amanda with some sane defaults automatically.
Even better: let it build the packages for debian etc ... the packaging scripts are there, I assume it shouldn't be that hard . Looking forward to this progress.

@prajwaltr93
Copy link
Contributor

you can get 3.5.3 packages here

https://www.zmanda.com/downloads/

sorry, missed the link there

@stefangweichinger
Copy link
Contributor Author

@prajwaltr93 Last time I was at that link, there were only downloads for Debian 8 or so. Thanks, will check these out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants