<header style="background:#00233C;padding-left:20pt;padding-right:20pt;padding-top:20pt;padding-bottom:10pt;"><img id="Teradata-logo" src="https://storage.googleapis.com/clearscape_analytics_demo_data/DEMO_Logo/teradata.svg" alt="Teradata" style="width: 100px; height: auto; margin-top: 20pt;" align="right">
<p style="font-size:20px; color:#ffffff">UDW INNOVATION DAYS</p>
<p style="font-size:24px; color:#ffffff">Teradata Package for Python: Introduction to teradataml </p>
<p style="font-size:16px; color:#ffffff">Getting Connected.</p>
</header>

#### Install teradataml package
Note: You only need to run this once. The "!" allows you to run Linux script from the notebook cell. Restart your Python kernel when complete.

In [None]:
#!pip install teradataml

#### Connection Variables
The package getpass is used to hide authentication strings for the purposes of this demo. You can use your preferred authentication script.

In [2]:
# to hide authentication strings
import getpass as gp

##### Set User and Password Variables

In [3]:
user = gp.getpass("User: ")

User:  ········


In [4]:
password = gp.getpass("Password: ")

Password:  ········


##### Set Connection Variables
A best practice is to use variables for your database names or schemas. This way, when you need to move code to production, you can simply change the database value in the variable rather than adjusting all your code.

In [5]:
host = 'UDWTest'
logmech = 'LDAP'
defaultDB = 'INOUDWTRAINING2024' 

### teradataml Context Management and Garbage Collection
teradataml offers various DataFrame API's and analytic functions that enables users to process the data and run analytics on the data. While doing so, teradataml internally creates various database objects (Views and Tables) on Vantage, whenever and wherever necessary. These views and tables as referred to as *virtual DataFrames*.  These virtual DataFrames are temporarily created and have a session-wide scope. At the end of the session, these virtual DataFrames will be removed by teradataml. teradataml GarbageCollector takes care of cleaning up these internally created views and tables.

Functions of Context Management include the following: 

- `create_context()`:  used to connect to Vantage and initiate a session
- `remove_context()`:  used to perform garbage collection and closes the session and connection.
- `get_context()`:  returns the Vantage engine associated with the current context.
- `set_context()`:  specifies a Vantage sqlalchemy engine as current context.
- `get_connection()`:  gets the connection object used in the current context.

#### Import context, configure, and DataFrame libraries from the teradataml package.

In [6]:
# for managing context
from teradataml import create_context, remove_context, execute_sql

# for setting configure options
from teradataml import configure

# for teradataml DataFrame object
from teradataml import DataFrame, in_schema

#### Establish your Context
Use the create_context() function to create a connection to Vantage using the teradatasql and teradatasqlalchemy DBAPI and dialect combination.
- You can pass all required arguments (host, username, password) to establish a connection to Vantage, or pass a sqlalchemy engine to the tdsqlengine parameter to override the default DBAPI and dialect combination. You can create connection to Vantage enabled with various security mechanisms.
- The optional logdata argument specifies parameters to the LOGMECH command beyond those needed by the logon mechanism, such as user ID, password and tokens (in case of JWT) to successfully authenticate the user.
- The optional database argument specifies the initial database to use after log on, instead of your default database.

To connect to an MLA Database, use the following parameters:

In [8]:
td_context = create_context(host = host, 
                            username= user, 
                            password = password, 
                            logmech='LDAP', 
                            database=defaultDB)



#### Create Virtual DataFrame from "demo_heartdisease" table

In [9]:
df = DataFrame("heartdisease") 

#### View top 10 rows

In [10]:
df.head()

ID,HeartDisease,BMI,Smoking,AlcoholDrinking,Stroke,PhysicalHealth,MentalHealth,DiffWalking,Sex,AgeCategory,Race,Diabetic,PhysicalActivity,GenHealth,SleepTime,Asthma,KidneyDisease,SkinCancer
2,No,26.58,Yes,No,No,20.0,30.0,No,Male,65-69,White,Yes,Yes,Fair,8.0,Yes,No,No
4,No,23.71,No,No,No,28.0,0.0,Yes,Female,40-44,White,No,Yes,Very good,8.0,No,No,No
5,Yes,28.87,Yes,No,No,6.0,0.0,Yes,Female,75-79,Black,No,No,Fair,12.0,No,No,No
6,No,21.63,No,No,No,15.0,0.0,No,Female,70-74,White,No,Yes,Fair,4.0,Yes,No,Yes
8,No,26.45,No,No,No,0.0,0.0,No,Female,80 or olde,White,"No, borderline diabetes",No,Fair,5.0,No,Yes,No
9,No,40.69,No,No,No,0.0,0.0,Yes,Male,65-69,White,No,Yes,Good,10.0,No,No,No
7,No,31.64,Yes,No,No,5.0,0.0,Yes,Female,80 or olde,White,Yes,No,Good,9.0,Yes,No,No
3,No,24.21,No,No,No,0.0,0.0,No,Female,75-79,White,No,No,Good,6.0,No,No,Yes
1,No,20.34,No,No,Yes,0.0,0.0,No,Female,80 or olde,White,No,Yes,Very good,7.0,No,No,No
0,No,16.6,Yes,No,No,3.0,30.0,No,Female,55-59,White,Yes,Yes,Very good,5.0,Yes,No,Yes


#### Executing a non-DataFrame SQL Query

In [29]:
sqlStr = "SHOW TABLE heartdisease"
results = execute_sql(sqlStr)
obj = results.fetchall()[0]
obj

['CREATE MULTISET TABLE INOUDWTRAINING2024.heartdisease ,FALLBACK ,\r     NO BEFORE JOURNAL,\r     NO AFTER JOURNAL,\r     CHECKSUM = DEFAULT,\r     DEFAULT MERGEBLOCKRATIO,\r     MAP = TD_MAP1\r     (\r      ID BIGINT,\r      HeartDisease VARCHAR(5) CHARACTER SET LATIN NOT CASESPECIFIC,\r      BMI DECIMAL(10,4),\r      Smoking VARCHAR(5) CHARACTER SET LATIN NOT CASESPECIFIC,\r      AlcoholDrinking VARCHAR(5) CHARACTER SET LATIN NOT CASESPECIFIC,\r      Stroke VARCHAR(5) CHARACTER SET LATIN NOT CASESPECIFIC,\r      PhysicalHealth DECIMAL(10,4),\r      MentalHealth DECIMAL(10,4),\r      DiffWalking VARCHAR(5) CHARACTER SET LATIN NOT CASESPECIFIC,\r      Sex VARCHAR(10) CHARACTER SET LATIN NOT CASESPECIFIC,\r      AgeCategory VARCHAR(10) CHARACTER SET LATIN NOT CASESPECIFIC,\r      Race VARCHAR(30) CHARACTER SET LATIN NOT CASESPECIFIC,\r      Diabetic VARCHAR(30) CHARACTER SET LATIN NOT CASESPECIFIC,\r      PhysicalActivity VARCHAR(5) CHARACTER SET LATIN NOT CASESPECIFIC,\r      GenHealt

In [30]:
from teradataml import db_list_tables

In [None]:
db_list_tables('INOUDWTRAINING2024')

### Remove Context as a Best Practice
The `remove_context` function removes the current context associated with the Vantage connection.

`remove_context()` not only closes the connection but also garbage collects the intermediate views and tables created by teradataml.

Teradata recommends calling `remove_context()` to end a session, so that intermediate views and tables created by teradataml are garbage collected.

This closes the corresponding connection object.

In [32]:
remove_context()

True

<span style="font-size:16px;">For online documentation on Teradata Vantage analytic functions, refer to the [Teradata Developer Portal](https://docs.teradata.com/) and search for phrases "Python User Guide" and "Python Function Reference".</span>