# Labs SDK - Datasources quick Start
This SDK allows you to interact with the platform components programatically from your Labs. This example notebook covers the interaction at the level of the *Connectors & DataSources*.

Every method of the SDK contains an argument called `namespace` that allows you to overide the `namespace` where you are in. The namespace is the UID of your current *Project*

### What can I do?!

* Access the Connectors already created in the UI 
* Access the Datasources already create in the UI
* Create a Connector that your are using in the lab in the UI
* Create a new Dataset in the UI 

##### Auxiliary functions

In [106]:
from ipywidgets import widgets
from IPython.display import display

conn_name = widgets.Text(
    description='Connector Name',
    disabled=False
)
Key_id = widgets.Text(
    description='Key ID',
    disabled=False
)

key_secret = widgets.Text(
    description='Key secret',
    disabled=False
)

region = widgets.Text(
    description='Region',
    disabled=False
)

conn_inputs = widgets.HBox([conn_name, Key_id, key_secret, region])

In [107]:
ds_name = widgets.Text(
    description='Datasource Name',
    disabled=False
)
ds_path = widgets.Text(
    description='Dataset path',
    disabled=False
)

dataset_inputs=conn_inputs = widgets.HBox([ds_name, ds_path])

## Connectors Module

### What is a Connector?

A Connector a connector, as its name implies, allows reusable connections to DataSources to access data through them.

It's from the responsability of each connector to implement those methods and the behaviour, but all of them returns a Connection. Through the SDK the following coonector actions are possible:

Connectors:

- **Connectors.list** - List of all available Connectors within the user namespace
- **Connectors.get**  - Get a particular Connectors details like uid, name, credentials, type (aws-s3, azure-blob, gcs, file, mysql, azure-sql, google-bigquery).

In [1]:
#Get the list of the available connectors
from ydata.labs import Connectors

Connectors.list()

[FileConnector(uid=39b03a65-33cd-421b-a580-852ac0db39c3, name=House pricing, path=house_price_train.csv) ,
 GoogleCloudStorageConnector(uid=5bbf90c6-f41e-4690-93a7-b8fc3fbc1f09, name=YData Synthetic) ,
 FileConnector(uid=6c760e79-de4b-4565-8d71-98b878d25641, name=New validation, path=data.csv) ,
 AWSS3Connector(uid=7a32a105-40c8-4f80-a852-b82d89f46d17, name=S3 Conn, region=eu-central-1) ,
 FileConnector(uid=7b374793-bdba-41fc-845d-afa4fe00be62, name=Teste upload, path=data.csv) ,
 AzureSQLConnector(uid=9a173727-778f-4ba0-969e-e01953b8b89f, name=Berka, host=ydata.database.windows.net, port=1433, database=berka) ,
 GoogleCloudStorageConnector(uid=9c38f5f0-3348-4371-ac94-ef998fa93fd8, name=Academy) ,
 FileConnector(uid=e1992fe0-9a25-4abe-9273-1b212a02881a, name=Validation upload, path=data.csv) ,
 MySQLConnector(uid=f0f53f7b-677e-4b9a-b2c7-68f1ce79ca15, name=Berka validation, host=datascience-tests.c1xxv3f18hni.eu-west-1.rds.amazonaws.com, port=3306, database=berka ]

Explain what I'm going to do next and what could have been done instead.

## How to create a Connector and a Datasource in the UI?

For this example we are using an *AWS S3 Connector*, but there are a few other connnectors that you can leverage.

### Create a Connector (AWSConnector)

In [104]:
display(conn_inputs)

HBox(children=(Text(value='', description='Connector Name'), Text(value='', description='Key ID'), Text(value=…

In [94]:
from ydata.labs import AWSS3Connector
from ydata.labs import AWSS3DataSource, DataType, FileType

connector = AWSS3Connector(name=conn_name.value,
                           key_id=Key_id.value,
                           key_secret=key_secret.value,
                           region=region.value)
connector.create()

### Create & Explore the Datasource from the connector

In [108]:
display(dataset_inputs)

HBox(children=(Text(value='', description='Datasource Name'), Text(value='', description='Dataset path')))

In [3]:
datasource = AWSS3DataSource(name=ds_name.value,
                             connector=connector,
                             data_type=DataType.TABULAR, #Dataset can be of type TABULAR or TIMESERIES
                             path=ds_path.value,
                             file_type=FileType.CSV, #File type can be CSV or PARQUET
                             sub_sample=1000, # type Optional[int] and default None
                             separator=',' # str and default ','
                            )

datasource.create()

In [None]:
#list available datasources to confirm the creation
from ydata.labs import DataSources

In [9]:
DataSources.list()

[GCSDataSource(uid='32c54776-c090-4df4-92f4-5d47459b1fee', name='Customer', metadata=Metadata(columns=[Column(name='CustomerID', data_type='categorical', var_type='string'), Column(name='Count', data_type='numerical', var_type='int'), Column(name='Country', data_type='categorical', var_type='string'), Column(name='State', data_type='categorical', var_type='string'), Column(name='City', data_type='categorical', var_type='string'), Column(name='Zip Code', data_type='numerical', var_type='int'), Column(name='Lat Long', data_type='categorical', var_type='string'), Column(name='Latitude', data_type='numerical', var_type='float'), Column(name='Longitude', data_type='numerical', var_type='float'), Column(name='Gender', data_type='categorical', var_type='string'), Column(name='Senior Citizen', data_type='categorical', var_type='string'), Column(name='Partner', data_type='categorical', var_type='string'), Column(name='Dependents', data_type='categorical', var_type='string'), Column(name='Tenure

### Writing a Dataset using a Connector created in the UI

In [17]:
from ydata.labs import Connectors
from ydata.labs import DataSources
from ydata.metadata import Metadata

# Creating a Dataset from the Data Source
datasource = DataSources.get(uid='datasource-uid',
                             namespace='namespace-id')

dataset = datasource.read()

dataset.head()

Unnamed: 0,CustomerID,Count,Country,State,City,Zip Code,Lat Long,Latitude,Longitude,Gender,...,Streaming Movies,Contract,Paperless Billing,Payment Method,Monthly Charges,Total Charges,Churn Label,Churn Value,CLTV,Churn Reason
0,3668-QPYBK,1,United States,California,Los Angeles,90003,"33.964131, -118.272783",33.964131,-118.272783,Male,...,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes,1,3239,Competitor made better offer
1,9237-HQITU,1,United States,California,Los Angeles,90005,"34.059281, -118.30742",34.059281,-118.30742,Female,...,No,Month-to-month,Yes,Electronic check,70.7,151.65,Yes,1,2701,Moved
2,9305-CDSKC,1,United States,California,Los Angeles,90006,"34.048013, -118.293953",34.048013,-118.293953,Female,...,Yes,Month-to-month,Yes,Electronic check,99.65,820.5,Yes,1,5372,Moved
3,7892-POOKP,1,United States,California,Los Angeles,90010,"34.062125, -118.315709",34.062125,-118.315709,Female,...,Yes,Month-to-month,Yes,Electronic check,104.8,3046.05,Yes,1,5003,Moved
4,0280-XJGEX,1,United States,California,Los Angeles,90015,"34.039224, -118.266293",34.039224,-118.266293,Male,...,Yes,Month-to-month,Yes,Bank transfer (automatic),103.7,5036.3,Yes,1,5340,Competitor had better devices


In [18]:
##Getting original datasource connector so we can write a new version of dataset to the same bucket
conn = datasource.connector

In [25]:
#Filtering the dataset
list_columns = [] #list of columns to subselect the dataset
fltr_dataset = dataset.select_columns(list_columns]).head()

In [None]:
#writing the data 
conn.write_file(data=fltr_dataset, 
                path='{output-path}')