In [105]:
%help


Available Magic Commands

## Sessions Magics
%help | Return a list of descriptions and input types for all magic commands. 
%profile | String | Specify a profile in your aws configuration to use as the credentials provider.
%region | String | Specify the AWS region in which to initialize a session | Default from ~/.aws/configure
%idle_timeout | Int | The number of minutes of inactivity after which a session will timeout. The default idle timeout value is 2880 minutes (48 hours).
%session_id | Returns the session ID for the running session. 
%session_id_prefix | String | Define a String that will precede all session IDs in the format [session_id_prefix]-[session_id]. If a session ID is not provided, a random UUID will be generated.
%status | Returns the status of the current Glue session including its duration, configuration and executing user / role.
%list_sessions | Lists all currently running sessions by name and ID.
%stop_session | Stops the current session.
%glue_version | String 

In [103]:
%number_of_workers 2
%idle_timeout 15
%glue_version 3.0


You are already connected to session 8b0e800e-6a7e-4409-a289-072016434cb5. Your change will not reflect in the current session, but it will affect future new sessions. 

Previous number of workers: 2
Setting new number of workers to: 2
You are already connected to session 8b0e800e-6a7e-4409-a289-072016434cb5. Your change will not reflect in the current session, but it will affect future new sessions. 

Current idle_timeout is 15 minutes.
idle_timeout has been set to 15 minutes.
You are already connected to session 8b0e800e-6a7e-4409-a289-072016434cb5. Your change will not reflect in the current session, but it will affect future new sessions. 

Setting Glue version to: 3.0


In [107]:
%profile sandbox

You are already connected to session 8b0e800e-6a7e-4409-a289-072016434cb5. Your change will not reflect in the current session, but it will affect future new sessions. 

Previous profile: sandbox
Setting new profile to: sandbox


In [13]:
spark

<pyspark.sql.session.SparkSession object at 0x7fedb3f4d2d0>


In [109]:
%status

Session ID: 8b0e800e-6a7e-4409-a289-072016434cb5
Status: READY
Role: arn:aws:iam::127945929854:role/pryan_glue_interactive_role
CreatedOn: 2022-11-02 13:51:04.831000-05:00
GlueVersion: 3.0
Worker Type: G.1X
Number of Workers: 2
Region: us-east-1
Applying the following default arguments:
--glue_kernel_version 0.36
--enable-glue-datacatalog true
Arguments Passed: ['--glue_kernel_version: 0.36', '--enable-glue-datacatalog: true']


Run the following code in each cell, which is the boilerplate syntax for AWS Glue:

In [20]:
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
glueContext = GlueContext(SparkContext.getOrCreate())




Read the publicly available Medicare Provider payment data in the AWS Glue data preparation sample document:

In [24]:
medicare_dynamicframe = glueContext.create_dynamic_frame.from_options(
    's3',
    {'paths': ['s3://awsglue-datasets/examples/medicare/Medicare_Hospital_Provider.csv']},
    'csv',
    {'withHeader': True})
print("Count:",medicare_dynamicframe.count())
medicare_dynamicframe.printSchema()

Count: 163065
root
|-- DRG Definition: string
|-- Provider Id: string
|-- Provider Name: string
|-- Provider Street Address: string
|-- Provider City: string
|-- Provider State: string
|-- Provider Zip Code: string
|-- Hospital Referral Region Description: string
|-- Total Discharges: string
|-- Average Covered Charges: string
|-- Average Total Payments: string
|-- Average Medicare Payments: string


Change the data type of the provider ID to long to resolve all incoming data to long

In [28]:
medicare_res = medicare_dynamicframe.resolveChoice(specs = [('Provider Id','cast:long')])
medicare_res.printSchema()

root
|-- DRG Definition: string
|-- Provider Id: long
|-- Provider Name: string
|-- Provider Street Address: string
|-- Provider City: string
|-- Provider State: string
|-- Provider Zip Code: string
|-- Hospital Referral Region Description: string
|-- Total Discharges: string
|-- Average Covered Charges: string
|-- Average Total Payments: string
|-- Average Medicare Payments: string


Display the providers

In [32]:
medicare_res.toDF().select('Provider Name').show(10,truncate=False)

+-----------------------------------+
|Provider Name                      |
+-----------------------------------+
|SOUTHEAST ALABAMA MEDICAL CENTER   |
|MARSHALL MEDICAL CENTER SOUTH      |
|ELIZA COFFEE MEMORIAL HOSPITAL     |
|ST VINCENT'S EAST                  |
|SHELBY BAPTIST MEDICAL CENTER      |
|BAPTIST MEDICAL CENTER SOUTH       |
|EAST ALABAMA MEDICAL CENTER AND SNF|
|UNIVERSITY OF ALABAMA HOSPITAL     |
|HUNTSVILLE HOSPITAL                |
|GADSDEN REGIONAL MEDICAL CENTER    |
+-----------------------------------+
only showing top 10 rows


In [111]:
%status

Session ID: 8b0e800e-6a7e-4409-a289-072016434cb5
Status: READY
Role: arn:aws:iam::127945929854:role/pryan_glue_interactive_role
CreatedOn: 2022-11-02 13:51:04.831000-05:00
GlueVersion: 3.0
Worker Type: G.1X
Number of Workers: 2
Region: us-east-1
Applying the following default arguments:
--glue_kernel_version 0.36
--enable-glue-datacatalog true
Arguments Passed: ['--glue_kernel_version: 0.36', '--enable-glue-datacatalog: true']


In [115]:
%list_sessions

The first 6 sessions are:
6f2eb17e-e325-4fbc-8104-b3b2a24fd010
6fd5e261-3ab4-42e4-b644-9d1ff83632d6
8063b947-778a-4486-a826-1598b5fb3012
8b0e800e-6a7e-4409-a289-072016434cb5
a809ec9d-79d5-4cc3-8139-96623e5d5e51
c8fe8666-dfc4-4a5b-92e5-1ecfdb14bf89


In [117]:
%status

Session ID: 8b0e800e-6a7e-4409-a289-072016434cb5
Status: READY
Role: arn:aws:iam::127945929854:role/pryan_glue_interactive_role
CreatedOn: 2022-11-02 13:51:04.831000-05:00
GlueVersion: 3.0
Worker Type: G.1X
Number of Workers: 2
Region: us-east-1
Applying the following default arguments:
--glue_kernel_version 0.36
--enable-glue-datacatalog true
Arguments Passed: ['--glue_kernel_version: 0.36', '--enable-glue-datacatalog: true']


In [121]:
%stop_session

Stopping session: 8b0e800e-6a7e-4409-a289-072016434cb5
Stopped session.


In [124]:
%delete_session

UsageError: Line magic function `%delete_session` not found.
