Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Support AWS Glue #1416

Closed
darrenhaken opened this Issue Aug 15, 2017 · 35 comments

Comments

Projects
None yet
7 participants
@darrenhaken
Copy link
Contributor

darrenhaken commented Aug 15, 2017

AWS Glue has now become GA https://aws.amazon.com/glue/

It would be nice to have this running through Terraform. Do you accept PR's as I could look to add it?

@Ninir

This comment has been minimized.

Copy link
Collaborator

Ninir commented Aug 15, 2017

Hey @darrenhaken

It seems the service is available in the Go SDK and CRUDs operations are also available.
Feel free to open a PR for that, we'll be more than happy to review & help on that :)

@Ninir Ninir added the enhancement label Aug 15, 2017

@darrenhaken

This comment has been minimized.

Copy link
Contributor Author

darrenhaken commented Aug 15, 2017

@Ninir

This comment has been minimized.

Copy link
Collaborator

Ninir commented Aug 15, 2017

Vendor dependencies

Generally speaking, you first need to vendor the dependencies, as exposed here: #1417.

Connection management

To be able to interact with the SDK and thus the API, you would need to hook the connection, in the aws/config.go file. Then, you would be able to get & use it in the resource.

Resource creation

A good start is the plugin provider documentation page, along with the core repository documentation.

At first you would need to create a HCL structure that matches the API. This is about creating a schema, which is an abstraction of the resource itself.

Tests

Adding acceptance tests is a good way of testing without the need of compiling yourself but reproducing a real-world use-case.
I encourage you to check what was already made by other contributors, which is a good start.

Once the test is written, run it:

make testacc TEST=./aws TESTARGS='-run=TestAccAWSResourceGlueClassifier_'

This will run only the test with name prefix being TestAccAWSResourceGlueClassifier_.

Documentation

The documentation is a required step for people to use it. It should be crystal clear about how to use each options, what is required, optional.
Again, I encourage you to check what was already done, will help a lot 😄

Global example

Check out this example about hooking into the SDK, resource, tests, documentation, etc.

Also, as per the 0.10 providers split, you may then need to build your local provider version using the README page to check it locally.

Last bit, if you could open a PR for the vendoring, and then 1 for each resource, it would be awesome.
For instance, 1 PR for the classifier + tests + doc, 1 for the devendpoint + tests + doc, etc.
If you need any review (even in a WIP), just ask :)

Hope all of this is good to you!
Have fun :) 👍

@drewsonne

This comment has been minimized.

Copy link
Contributor

drewsonne commented Jan 12, 2018

I'm (probably) going to work my way through the resources. If anyone has any wishes as to which glue resource they would want next after the above database resource, let me know.

I'm taking requests, but can't promise anything.

@darrenhaken

This comment has been minimized.

Copy link
Contributor Author

darrenhaken commented Jan 12, 2018

@drewsonne I'd like the crawlers next and complete the data catalogue part of the product.

Do you need any help working on it?

@bflad

This comment has been minimized.

Copy link
Member

bflad commented Jan 12, 2018

The aws_glue_catalog_database resource has been released in terraform-provider-aws version 1.7.0. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

@darrenhaken

This comment has been minimized.

Copy link
Contributor Author

darrenhaken commented Jan 12, 2018

@darrenhaken

This comment has been minimized.

Copy link
Contributor Author

darrenhaken commented Jan 15, 2018

@drewsonne have you done any work on the crawlers? If not I will pick this up.

@drewsonne

This comment has been minimized.

Copy link
Contributor

drewsonne commented Jan 15, 2018

@darrenhaken I have not yet. My first plan was to build a test case with test steps in a similiar manner to the database catalogue. I might get onto this in the week, but let me know how you go and if you start work on this in another repo

@darrenhaken

This comment has been minimized.

Copy link
Contributor Author

darrenhaken commented Jan 15, 2018

@drewsonne I'll take a look at the Crawlers. Is the vendor code up to date and contains the AWS SDK API for it?

@drewsonne

This comment has been minimized.

Copy link
Contributor

drewsonne commented Jan 15, 2018

Can't remember, but it's in the vendor json file

@drewsonne

This comment has been minimized.

Copy link
Contributor

drewsonne commented Jan 15, 2018

@darrenhaken I've added a skeleton test case for the crawler here. drewsonne@2dcd3c6 . It's by no means exhaustive, but would definitely be a start.

Not sure how you want to do this, but that's a start for you.

@darrenhaken

This comment has been minimized.

Copy link
Contributor Author

darrenhaken commented Jan 16, 2018

@drewsonne thanks, I'll use that to start looking at it.

@darrenhaken

This comment has been minimized.

Copy link
Contributor Author

darrenhaken commented Jan 18, 2018

@drewsonne I'm pretty close to having the crawlers finished

@drewsonne

This comment has been minimized.

Copy link
Contributor

drewsonne commented Jan 18, 2018

@darrenhaken sweet, I assume it's the "glue_crawlers" branch in your repo?

@darrenhaken

This comment has been minimized.

Copy link
Contributor Author

darrenhaken commented Jan 18, 2018

@drewsonne that's the one. I haven't pushed it because I've been wrangling some issues with defining a schema.

Have you done much with TF before? Maybe you can provide a pointer or two if I show you.

WIP has been pushed now if interested to see progress

@drewsonne

This comment has been minimized.

Copy link
Contributor

drewsonne commented Jan 20, 2018

@darrenhaken I've got a bit of experience writing providers. What are the issues you've been wrangling with the schema specifically?

@darrenhaken

This comment has been minimized.

Copy link
Contributor Author

darrenhaken commented Jan 21, 2018

@drewsonne I'm working on creating the Crawler with the required fields Name, DatabaseName and Targets. See https://docs.aws.amazon.com/sdk-for-go/api/service/glue/#CreateCrawlerInput

Here is the struct definition for Targets which is a nested struct within CreateCrawlerInput:

type CrawlerTargets struct {

    // Specifies JDBC targets.
    JdbcTargets []*JdbcTarget `type:"list"`

    // Specifies Amazon S3 targets.
    S3Targets []*S3Target `type:"list"`
    // contains filtered or unexported fields
}

What I am trying to work out is what is the correct TF schema to map to the CrawlerTargets struct. I made an attempt in the acceptance test to do this but I didn't get any luck.

FYI the acceptance test isn't complete yet - you need to have an AWS account with an S3 bucket and an AWS Glue DB.

@drewsonne

This comment has been minimized.

Copy link
Contributor

drewsonne commented Jan 22, 2018

In these cases (and I think there's some precedence for this, but I can not remember where), I would flatten the schema a bit, eg:

"s3_targets": {
	Type:          schema.TypeList,
	Optional:      true,
	MinItems:      1,
	Elem: &schema.Resource{
		Schema: map[string]*schema.Schema{
			"path": {
				Type:     schema.TypeString,
				Required: true,
			},
			"exclusions": {
				Type:     schema.TypeSet,
				Optional: true,
				Elem:     schema.TypeString,
			},
		},
	},
},
"jdbc_targets": {
	Type:          schema.TypeList,
	Optional:      true,
	MinItems:      1,
	Elem: &schema.Resource{
		Schema: map[string]*schema.Schema{
			"connection_name": {
				Type:     schema.TypeString,
				Required: true,
			},
			"path": {
				Type:     schema.TypeString,
				Required: true,
			},
			"exclusions": {
				Type:     schema.TypeSet,
				Optional: true,
				Elem:     schema.TypeString,
			},
		},
	},
},

And then put them back into the schema structure back in the Read/Write/Update handlers. I think this is simpler as the two top-level attributes have very specific schemas, in comparison to either having two different types of objects (jdbc and s3) smushed into one schema, or having an extra layer in the config tree.

Hoping maybe @Ninir, @bflad, or @radeksimko would weigh in on this?

@darrenhaken

This comment has been minimized.

Copy link
Contributor Author

darrenhaken commented Jan 22, 2018

@drewsonne I'd thought about flattening the schema. The only drawback is that it affects Required: true fields, you only need s3_targets or jdbc_targets so declaring them as both being required doesn't make sense.

@bflad

This comment has been minimized.

Copy link
Member

bflad commented Jan 22, 2018

In general, we tend to prefer to follow the AWS API/schema unless the user experience can be heavily improved by doing our own implementation due to future maintenance costs.

While we cannot set a Required: true type schema definition to say require one of s3_targets or jdbc_targets (at the moment anyways -- its a decent request we could use in other places as well and probably worth submitting a feature request upstream into Terraform core), we can do the logic in create/update to essentially require one or the other.

e.g. something like this should do the trick

jdbc_targets, jdbc_targets_ok := d.GetOk("jdbc_targets")
s3_targets, s3_targets_ok := d.GetOk("s3_targets")
if !jdbc_targets_ok && !s3_targets_ok {
    return fmt.Errorf("jdbc_targets or s3_targets configuration is required")
}
@darrenhaken

This comment has been minimized.

Copy link
Contributor Author

darrenhaken commented Jan 22, 2018

@bflad that makes sense, I think deviating too much from the AWS API would only cause confusion in the long run. I'll look at combining both yours and @drewsonne suggestions.

@drewsonne

This comment has been minimized.

Copy link
Contributor

drewsonne commented Jan 25, 2018

btw, @darrenhaken if you keep your git branch up to date, I can write some acceptance tests for you with the S3 bucket and the glue db and even run them.

@darrenhaken

This comment has been minimized.

Copy link
Contributor Author

darrenhaken commented Jan 25, 2018

@darrenhaken

This comment has been minimized.

Copy link
Contributor Author

darrenhaken commented Jan 25, 2018

@cemo

This comment has been minimized.

Copy link

cemo commented Feb 16, 2018

@darrenhaken what is the situation of your work? Any progress?

@darrenhaken

This comment has been minimized.

Copy link
Contributor Author

darrenhaken commented Feb 19, 2018

@cemo I’ll start looking at it again this week. I had to flip back to finishing off another TF PR which has been accepted now. Sorry for the delay!

@chwevans

This comment has been minimized.

Copy link

chwevans commented Mar 2, 2018

I'm interested in using this as well, is there any expected timeline?

@darrenhaken

This comment has been minimized.

Copy link
Contributor Author

darrenhaken commented Mar 2, 2018

I’ll add crawlers this week (hopefully) with the required fields added. I will link the repo/branch.

@darrenhaken

This comment has been minimized.

Copy link
Contributor Author

darrenhaken commented Mar 13, 2018

@drewsonne did you still want to work on this?

@darrenhaken

This comment has been minimized.

Copy link
Contributor Author

darrenhaken commented Mar 14, 2018

I've been working on this some more and I have an acceptance test with the following TF:

resource "aws_glue_catalog_crawler" "test" {
	  name = "test"
	  database_name = "db_name"
	  role = "${aws_iam_role.glue.arn}"
	  s3_target {
		path = "s3://bucket"
	  }
	}
	
	resource "aws_iam_role_policy_attachment" "aws-glue-service-role-default-policy-attachment" {
  		policy_arn = "arn:aws:iam::aws:policy/AWSGlueServiceRole"
  		role = "${aws_iam_role.glue.arn}"
	}
	
	resource "aws_iam_role" "glue" {
  		name = "AWSGlueServiceRoleDefault"
  		assume_role_policy = <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": "sts:AssumeRole",
      "Principal": {
        "Service": "glue.amazonaws.com"
      },
      "Effect": "Allow",
      "Sid": ""
    }
  ]
}
EOF
	}

I receive the following error:

=== RUN   TestAccAWSGlueCrawler_full
--- FAIL: TestAccAWSGlueCrawler_full (13.25s)
	testing.go:513: Step 0 error: Error applying: 2 error(s) occurred:

		* aws_iam_role_policy_attachment.aws-glue-service-role-default-policy-attachment: 1 error(s) occurred:

		* aws_iam_role_policy_attachment.aws-glue-service-role-default-policy-attachment: [WARN] Error attaching policy arn:aws:iam::aws:policy/AWSGlueServiceRole to IAM Role arn:aws:iam::{account_id}:role/AWSGlueServiceRoleDefault: ValidationError: The specified value for roleName is invalid. It must contain only alphanumeric characters and/or the following: +=,.@_-
			status code: 400, request id: 43da944e-27a7-11e8-8481-452f34f141d6
		* aws_glue_catalog_crawler.test: 1 error(s) occurred:

		* aws_glue_catalog_crawler.test: error creating Glue crawler: InvalidInputException: Service is unable to assume role arn:aws:iam::{account_id}:role/AWSGlueServiceRoleDefault. Please verify role's TrustPolicy
			status code: 400, request id: 4405c2e7-27a7-11e8-9a99-111706a35ce0
FAIL
FAIL	github.com/terraform-providers/terraform-provider-aws/aws	13.293s
make: *** [testacc] Error 1

Any ideas?

@bflad

This comment has been minimized.

Copy link
Member

bflad commented Mar 15, 2018

This error is being generated from the AWS API:

		* aws_iam_role_policy_attachment.aws-glue-service-role-default-policy-attachment: [WARN] Error attaching policy arn:aws:iam::aws:policy/AWSGlueServiceRole to IAM Role arn:aws:iam::697329683179:role/AWSGlueServiceRoleDefault: ValidationError: The specified value for roleName is invalid. It must contain only alphanumeric characters and/or the following: +=,.@_-
			status code: 400, request id: 43da944e-27a7-11e8-8481-452f34f141d6

Relevant code:

err := attachPolicyToRole(conn, role, arn)
if err != nil {
return fmt.Errorf("[WARN] Error attaching policy %s to IAM Role %s: %v", arn, role, err)
}

It looks like we're missing some validation to ensure its an IAM role name being passed in instead of a IAM role ARN (or properly handle this situation by parsing the name out of the ARN). We should probably make a separate issue for that. 😉

You can fix your Terraform configuration by changing the role attribute to use aws_iam_role.role.name instead of aws_iam_role.role.arn for now 👍

resource "aws_iam_role_policy_attachment" "aws-glue-service-role-default-policy-attachment" {
  policy_arn = "arn:aws:iam::aws:policy/AWSGlueServiceRole"
  role       = "${aws_iam_role.glue.name}"
}
@darrenhaken

This comment has been minimized.

Copy link
Contributor Author

darrenhaken commented Mar 15, 2018

I've tried the name and I still had the same issue. I'll be in the office in a couple of hours, I'll push my WIP on my fork if you wouldn't mind taking a look? The error is raised by Amazon around the crawler input and something about trust policy.

Any help is appreciated :-)

@darrenhaken

This comment has been minimized.

Copy link
Contributor Author

darrenhaken commented Mar 15, 2018

@bflad here is the changed code:

resource "aws_glue_catalog_crawler" "test" {
	  name = "test"
	  database_name = "db_name"
	  role = "${aws_iam_role.glue.name}"
	  s3_target {
		path = "s3://bucket"
	  }
	}
	
	resource "aws_iam_role_policy_attachment" "aws-glue-service-role-default-policy-attachment" {
  		policy_arn = "arn:aws:iam::aws:policy/AWSGlueServiceRole"
  		role = "${aws_iam_role.glue.name}"
	}
	
	resource "aws_iam_role" "glue" {
  		name = "AWSGlueServiceRoleDefault"
  		assume_role_policy = <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": "sts:AssumeRole",
      "Principal": {
        "Service": "glue.amazonaws.com"
      },
      "Effect": "Allow",
      "Sid": ""
    }
  ]
}
EOF
	}

Error is same as the original question and mentions:

Service is unable to assume role arn:aws:iam::{account}:role/AWSGlueServiceRoleDefault. Please verify role's TrustPolicy

You can see the code at:
git@github.com:darrenhaken/terraform-provider-aws.git and the branch is glue_crawlers

@bflad

This comment has been minimized.

Copy link
Member

bflad commented Mar 22, 2018

Hi, everyone! 👋 I'm going to split this work into more manageable issues so that there can be a clear definition of done for various support. Looking at the API, here are what look like relevant resources and data sources for Terraform to manage with the AWS Glue service:

  • New aws_glue_classifier resource: #3873
  • New aws_glue_connection resource: #3874
  • New aws_glue_crawler resource: #3875
  • New aws_glue_dev_endpoint resource: #3876
  • New aws_glue_job resource: #3877
  • New aws_glue_partition resource: #3878
  • New aws_glue_script data source: #3879
  • New aws_glue_table resource: #3880
  • New aws_glue_trigger resource: #3882
  • New aws_glue_user_defined_function resource: #3883

Please follow along the split issues and note any that you would like to contribute in the new issues. Personally, I will be picking up connections and jobs shortly. Thanks!

@bflad bflad closed this Mar 22, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.