New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Push to S3: malformed URL #46
Comments
Hello @sparkysean, thanks for opening up an issue. That definitely looks like some incorrect concatenation is going on, however, that being said, I don't think it's that particular snippet of code that's causing the issue: I noticed you are using an S3 URL that ends with
If airflow-dbt-python is indeed getting to line 174 onward, then the bug is before that: as when using a zip file we shouldn't even be in the I tested out the Could you please share some of your DAG code? I'm particularly interested in how the Also, it would be great if you can share which Airflow, Python, and Airflow Amazon Providers (if running Airflow >= 2.0) versions are you running, as the implementation of |
I was able to reproduce the issue with Airflow 2.2.2: I see it interprets the entire S3 URL as the key. I think this could be solved by calling Unfortunately, our tests were hiding the issue as we checked with the full URL as the key: key = hook.check_for_key(
f"s3://{s3_bucket}/project/project.zip",
s3_bucket,
)
assert key is True Which passes when we upload the whole thing. I truly believe this was caused by the differences between Airflow 1.10 and Airflow 2.x, as I have tested 1.10 and it works fine (meaning no full path is uploaded). Our testing pipeline is much more comprehensive and should let us know if we break any of the Airflow versions we support, so I will push a patch to fix it. Thanks for reporting it! |
Closed by #48. Expect patch in v0.13.1. |
Hi @tomasfarias, Thanks so much for getting back to me! It looks like you've already found a solution for this, but for completeness here's your questions answered about my specific environment: MWAA Airflow version: 2.2.2 DAG code: from datetime import datetime, timedelta
from airflow import DAG
from airflow_dbt_python.operators.dbt import DbtRunOperator
with DAG('example_dbt_run',
description='testing out dbt run from MWAA',
schedule_interval=timedelta(days=1),
start_date=datetime(2022, 2, 16),
catchup=False
) as dag:
dbt_run = DbtRunOperator(
task_id="dbt_run",
project_dir="s3://mwaa-test-bucket/dbt/project/dbt.zip",
profiles_dir="s3://mwaa-test-bucket/dbt/profile/",
target="dev",
profile="snowflake",
push_dbt_project=True
) Thanks for a (ridiculously!) quick turnaround on this, as well as for developing these super useful Airflow hooks/operators in the first place - looking forward to testing this out in our environment in v0.13.1 🎉 |
Hi there,
I'm testing out
airflow-dbt-python==0.12.0
on AWS MWAA. It's working brilliantly withpush_dbt_project=False
, but fails when trying to push things back to S3 usingpush_dbt_project=True
. I definitely have an IAM policy associated with MWAA that gives the correct PutObject permission for the bucket.Looking at my log output, I think there's something wrong with the arguments being passed into
load_file_handle_replace_error
.The log message written by the operator shows the correct target URL:
But when the boto3 S3UploadFailedError occurs later, the error output shows a malformed URL:
I had a quick look at the S3 hook code, and I think it may be to do with this bit:
airflow-dbt-python/airflow_dbt_python/hooks/s3.py
Lines 174 to 181 in 5472927
As I understand it, what's happening here is that the full S3 URL is being passed in as the object key, where I think it should just be the object key itself (assuming that the underlying Airflow S3 hook code should be able to construct the final S3 URL using the supplied bucket name and key).
This probably explains why the logs have a malformed URL - it looks like it's concatenating the bucket name (mwaa-test-bucket/) and key (s3://mwaa-test-bucket/dbt/project/dbt.zip).
The text was updated successfully, but these errors were encountered: