Skip to content
Vaquar Khan edited this page Dec 9, 2022 · 4 revisions
			import sys
			from awsglue.transforms import *
			from awsglue.utils import getResolvedOptions
			from pyspark.context import SparkContext
			from awsglue.context import GlueContext
			from awsglue.dynamicframe import DynamicFrame, DynamicFrameReader, DynamicFrameWriter, DynamicFrameCollection
			from pyspark.sql.functions import lit
			from awsglue.job import Job

			sc = SparkContext()
			glueContext = GlueContext(sc)
			spark = glueContext.spark_session
			job = Job(glueContext)

			table = glueContext.create_dynamic_frame.from_options(connection_type="redshift", connection_options = 
				{"url": "jdbc:redshift://xxxxx.yyyyy.us-east-1.redshift.amazonaws.com:5439/db",
				"user": "yyyy",
				"password": "yyyyy",
				"dbtable": "schema.table_name",
				"redshiftTmpDir": "s3://aws-glue-temporary-accountnumber-us-east-1/"}
				 )

			applyformat = ApplyMapping.apply(frame =table, mappings =
				[("field1","string","field1","string"),
				("field2","string","field2","string") ], transformation_ctx = "applyformat")    


			dfred = table.toDF().createOrReplaceTempView("table_df")

			sqlDF = spark.sql(
				"SELECT COUNT(*) FROM table_df"
				)

======================================================

AWS Glue service internally handles the write_dynamic_frame_from_jdbc_conf method for redshift is to write the Glue DyanamicFrame data into multiple CSV files and create a manifest file onto the S3 location specified in the temporary directory parameter of the Glue job. Then these CSV files are loaded into the redshift database via COPY command using the manifest file. And as CSV does not support map data type the Glue job was failing.

======================================================

Clone this wiki locally