# Connectors - Snowflake

[YData SDK](https://pypi.org/project/ydata-sdk/) provides a seamless integration with Snowflake, allowing you to connect,
query, and manage your data in Snowflake with ease. This section will guide you through the benefits,
setup, and usage of the Snowflake connector within ydata-sdk.

### Benefits of Integration
Integrating YData SDK with Snowflake offers several key benefits:

- **Scalability:** Snowflake's architecture scales effortlessly with your data needs, while YData's tools ensure efficient data integration and management.
- **Performance:** Leveraging Snowflake's high performance for data querying and YData's optimization techniques enhances overall data processing speed.
- **Security:** Snowflake's robust security features, combined with YData's data governance capabilities, ensure your data remains secure and compliant.
- **Interoperability:** YData SDK simplifies the process of connecting to Snowflake, allowing you to quickly set up and start using the data without extensive configuration. Benefit from the unique ydata-sdk functionalities like data preparation with Python, synthetic data generation and data profiling.


### Authenticate with your account YData

In [None]:
# Authenticate with your ydata-sdk token - https://dashboard.ydata.ai/
import os

os.environ['YDATA_LICENSE_KEY'] = '{add-your-key}'

## Create a Snowflake connector

In [None]:
USERNAME = "insert-username"
PASSWORD = "insert-password"
ACCOUNT_IDENTIFIER = "insert-account"
PORT = 443
DATABASE_NAME = "insert-DATABASE"
SCHEMA = "insert-SCHEMA"
WAREHOUSE = "insert-WAREHOUSE"

conn_str = {
    "hostname": ACCOUNT_IDENTIFIER,
    "username": USERNAME,
    "password": PASSWORD,
    "port": PORT,
    "database": DATABASE_NAME,
    "warehouse": WAREHOUSE
}

In [None]:
from ydata.connectors import SnowflakeConnector

connector = SnowflakeConnector(conn_string=conn_str)

### Navigate your database

In [2]:
#list the available schemas
schemas = connector.list_schemas() #returns a list of schemas

## get the metadata of a database schema
schema = connector.get_database_schema(schema_name='PATIENTS')

INFO: 2024-05-24 20:06:32,083 Snowflake Connector for Python Version: 3.10.0, Python Version: 3.10.12, Platform: Linux-5.10.186-179.751.amzn2.x86_64-x86_64-with-glibc2.35
INFO: 2024-05-24 20:06:32,085 This connection is in OCSP Fail Open Mode. TLS Certificates would be checked for validity and revocation status. Any other Certificate Revocation related exceptions or OCSP Responder failures would be disregarded in favor of connectivity.
INFO: 2024-05-24 20:06:33,378 Number of results in first chunk: 1
INFO: 2024-05-24 20:06:33,718 Number of results in first chunk: 1
INFO: 2024-05-24 20:06:33,916 Number of results in first chunk: 1
INFO: 2024-05-24 20:06:34,105 Number of results in first chunk: 1
INFO: 2024-05-24 20:06:34,290 Number of results in first chunk: 1
INFO: 2024-05-24 20:06:34,472 Number of results in first chunk: 1
INFO: 2024-05-24 20:06:34,736 Number of results in first chunk: 3
INFO: 2024-05-24 20:06:34,923 Number of results in first chunk: 1
INFO: 2024-05-24 20:06:35,102 Nu

## Read from your Snowflake
Using the Snowflake connector it is possible to:
- Get the data from a Snowflake table
- Get a sample from a Snowflake table
- Get the data from a query to a Snowflake instance
- Get the full data from a selected database

In [3]:
table = connector.get_table(table='cardio_test')
print(table)

INFO: 2024-05-24 20:06:40,197 Snowflake Connector for Python Version: 3.10.0, Python Version: 3.10.12, Platform: Linux-5.10.186-179.751.amzn2.x86_64-x86_64-with-glibc2.35
INFO: 2024-05-24 20:06:40,200 This connection is in OCSP Fail Open Mode. TLS Certificates would be checked for validity and revocation status. Any other Certificate Revocation related exceptions or OCSP Responder failures would be disregarded in favor of connectivity.
INFO: 2024-05-24 20:06:40,886 Number of results in first chunk: 1
INFO: 2024-05-24 20:06:41,067 Number of results in first chunk: 1
INFO: 2024-05-24 20:06:41,240 Number of results in first chunk: 1
INFO: 2024-05-24 20:06:41,427 Number of results in first chunk: 1
INFO: 2024-05-24 20:06:41,619 Number of results in first chunk: 0
INFO: 2024-05-24 20:06:43,651 Number of results in first chunk: 76
INFO: 2024-05-24 20:06:43,866 Number of results in first chunk: 0
INFO: 2024-05-24 20:06:44,057 Number of results in first chunk: 0
INFO: 2024-05-24 20:06:44,267 N

In [4]:
table_sample = connector.get_table_sample(table='cardio_test', 
                                          sample_size=50)
print(table_sample)

INFO: 2024-05-24 20:07:37,251 Number of results in first chunk: 50
INFO: 2024-05-24 20:07:37,253 Snowflake Connector for Python Version: 3.10.0, Python Version: 3.10.12, Platform: Linux-5.10.186-179.751.amzn2.x86_64-x86_64-with-glibc2.35
INFO: 2024-05-24 20:07:37,254 This connection is in OCSP Fail Open Mode. TLS Certificates would be checked for validity and revocation status. Any other Certificate Revocation related exceptions or OCSP Responder failures would be disregarded in favor of connectivity.
INFO: 2024-05-24 20:07:37,751 Number of results in first chunk: 50
INFO: 2024-05-24 20:07:37,932 Number of results in first chunk: 1
INFO: 2024-05-24 20:07:38,114 Number of results in first chunk: 1
[1mDataset 
 
[0m[1mShape: [0m(50, 12)
[1mSchema: [0m
         Column Variable type
0            id         float
1           age         float
2        height         float
3        weight         float
4         ap_hi         float
5         ap_lo         float
6   cholesterol         

In [5]:
query_output = connector.query(query="SELECT * FROM patients.cardio_test;")
print(query_output)

INFO: 2024-05-24 20:07:38,476 Number of results in first chunk: 1000
INFO: 2024-05-24 20:07:38,714 Number of results in first chunk: 1000
INFO: 2024-05-24 20:07:38,894 Number of results in first chunk: 1
INFO: 2024-05-24 20:07:39,077 Number of results in first chunk: 1
[1mDataset 
 
[0m[1mShape: [0m(1000, 12)
[1mSchema: [0m
         Column Variable type
0            id           int
1           age           int
2        height           int
3        weight           int
4         ap_hi           int
5         ap_lo           int
6   cholesterol          bool
7          gluc          bool
8         smoke          bool
9          alco          bool
10       active          bool
11       cardio          bool




### Read from a database
If you need to replicate an entire database or perform actions such as joining or merging full tables, you can read all tables within a schema or a specified set of tables using the Snowflake connector. The following actions are possible with the Snowflake connector:

- Read an entire database in either lazy or non-lazy mode.
- Read a specific set of tables.

#### Lazy mode
Lazy mode in YData Fabric's RDBMs connectors allows you to create an iterator that defers reading data from the database tables until an action is required. This approach optimizes performance and resource usage by loading data only when necessary.

When using lazy mode, the data is not immediately fetched from the database. Instead, the connector sets up an iterator that references the tables. Data is read only when you perform actions that require accessing the actual data, such as counting the number of rows, joining tables, or filtering data.

In [6]:
database = connector.read_database(lazy=True)
print(database)

INFO: 2024-05-24 20:07:39,369 Number of results in first chunk: 1
INFO: 2024-05-24 20:07:39,587 Number of results in first chunk: 7
INFO: 2024-05-24 20:07:39,771 Number of results in first chunk: 1
INFO: 2024-05-24 20:07:39,971 Number of results in first chunk: 0
INFO: 2024-05-24 20:07:41,421 Number of results in first chunk: 76
INFO: 2024-05-24 20:07:41,642 Number of results in first chunk: 0
INFO: 2024-05-24 20:07:41,860 Number of results in first chunk: 0
INFO: 2024-05-24 20:07:42,073 Number of results in first chunk: 1
INFO: 2024-05-24 20:07:42,273 Number of results in first chunk: 1
INFO: 2024-05-24 20:07:42,500 Number of results in first chunk: 1
INFO: 2024-05-24 20:07:42,693 Number of results in first chunk: 1
INFO: 2024-05-24 20:07:42,896 Number of results in first chunk: 1
INFO: 2024-05-24 20:07:43,112 Number of results in first chunk: 1
INFO: 2024-05-24 20:07:43,319 Number of results in first chunk: 1
INFO: 2024-05-24 20:07:43,495 Number of results in first chunk: 1
[1mMulti

In [7]:
tables = connector.get_tables(tables=['cardio_test', 'cardio_test2'])
print(tables)

INFO: 2024-05-24 20:07:43,548 Snowflake Connector for Python Version: 3.10.0, Python Version: 3.10.12, Platform: Linux-5.10.186-179.751.amzn2.x86_64-x86_64-with-glibc2.35
INFO: 2024-05-24 20:07:43,555 This connection is in OCSP Fail Open Mode. TLS Certificates would be checked for validity and revocation status. Any other Certificate Revocation related exceptions or OCSP Responder failures would be disregarded in favor of connectivity.
INFO: 2024-05-24 20:07:44,276 Number of results in first chunk: 1
INFO: 2024-05-24 20:07:44,463 Number of results in first chunk: 1
INFO: 2024-05-24 20:07:44,651 Number of results in first chunk: 1
INFO: 2024-05-24 20:07:44,842 Number of results in first chunk: 1
INFO: 2024-05-24 20:07:45,030 Number of results in first chunk: 0
INFO: 2024-05-24 20:07:46,774 Number of results in first chunk: 76
INFO: 2024-05-24 20:07:46,975 Number of results in first chunk: 0
INFO: 2024-05-24 20:07:47,462 Number of results in first chunk: 0
INFO: 2024-05-24 20:07:47,657 N

## Write to your Snowflake
If you need to write your data into a Snowflake instance you can also leverage your Snowflake connector for the following actions:

- Write the data into a table
- Write a new database schema

In [None]:
# Write the data to a new table called cardio_test in the set schema
# If exists allow you to decide wether you want to append, replace or fail in case a table with the same name already exists in the schema.
connector.write_table(data=tables['cardio_test'],
                      name='cardio',
                      if_exists='fail')

In [None]:
# Write the database as a new schema
# table_names allow you to define a new name for the table in the database. If not provided it will be assumed the table names from your dataset.
connector.write_database(data=database,
                         schema_name='new_cardio',
                         table_names={'cardio_test': 'cardio'})
