# Sagemaker+Athena: Data connection

This notebook is intended to teach how to connect a sagemaker notebook to aws athena data using python 3.

## Contents

- Installing PyAthena
- Checking AWS Region
- Connecting to Athena
    - Importing libs
    - Creating engine connection
    - Reading data with pandas

## References

[AWS Data engineering Immersion Day](https://catalog.us-east-1.prod.workshops.aws/workshops/976050cc-0606-4b23-b49f-ca7b8ac4b153/en-US/800/830-athena-ml-usecase)

## Installing PyAthena

In [1]:
!pip install PyAthena[SQLAlchemy]

Looking in indexes: https://pypi.org/simple, https://pip.repos.neuron.amazonaws.com


## Checking AWS region

In [2]:
!aws configure get region

sa-east-1


## Connecting to athena

Importing libs

In [3]:
from sqlalchemy import create_engine
import pandas as pd

Creating engine connection with athena:

In [6]:
# Your aws s3 athena directory, the same used to configure the athena queries
s3_staging_dir = "s3://aws-athena-results-xxxxx/"
# AWS region of this notebook
aws_region = 'sa-east-1'
# Connection engine string
connection_string = f"awsathena+rest://:@athena.{aws_region}.amazonaws.com:443/ticketdata?s3_staging_dir={s3_staging_dir}"

# Creating connection engine
engine = create_engine(connection_string)

In [7]:
# Reading the data with pandas
df = pd.read_sql('SELECT * FROM "testes"."test_table"', engine)
df

Unnamed: 0,name,last
0,William,Lima
1,Marcos,Vinicius
2,Clara,Isabela
