# MongoDB To RedShift Data Transfer 

This notebook demostrates, how we can use direct MongoDB and Redshift connectors to move data. 

### Pulling data from MongoDB
The below function uses MongoDB data connection (configured udner data connections). 

In [2]:
from pyspark.context import SparkContext
from awsglue.context import GlueContext

sc = SparkContext.getOrCreate();
glueContext = GlueContext(sc)

mongodb_read = glueContext.create_dynamic_frame.from_options(
    connection_type="mongodb",
    connection_options={
        "connectionName": "Mongodbatlas connection",
        "database": "big_store",
        "collection": "orders",
        "partitioner": "com.mongodb.spark.sql.connector.read.partitioner.SinglePartitionPartitioner",
        "partitionerOptions.partitionSizeMB": "10",
        "partitionerOptions.partitionKey": "_id",
        "disableUpdateUri": "false",
    }
)




### Printing the MongoDB Schema

In [1]:
mongodb_read.printSchema()

### Printing data as Dataframe

The data shown can be translated into dataframe to have a look at the data coming in.

In [2]:
mongodb_read.toDF().show()

### Writing data to Redshift
The data can be finally written to Redshift using the data connectors configured. 

In [3]:
redshift_options = {
    "dbtable": "public.orders",  # Destination table name in Redshift
    "user": "admin",
    "password": "Test#123",
    "database" : "dev",
    "connectTimeout": 10000

}

glueContext.write_dynamic_frame.from_jdbc_conf(
    frame=mongodb_read,
    catalog_connection="Redshift connection",  # Glue Data Catalog connection name for Redshift
    connection_options=redshift_options,
    redshift_tmp_dir="s3://utsav-demo-partner/"  # S3 path for temporary files (required)
)

print("Data Written to Redshift!")