# Loading Data From SQL Databases
Pandas DataFrame has native support for reading and writing to various SQL databases<br>
the user first create a DB connection using the database specific library and use the `pd.read_sql()` or `pd.read_sel_query()` to read the database table into a dataframe, once we have a DataFrame object we can manipulate it and store it into iguazio database or time-series tabels.

The following example demonstrate working with MySQL

In [None]:
# install MySQL lib
!pip install pymysql 

Set the database connection. in this example we are using a public mysql database called Rfam (https://rfam.readthedocs.io/en/latest/database.html)<br>
Then run the sql and keep the result into the dataframe

In [6]:
import os
import pymysql
import pandas as pd 

conn = pymysql.connect(
    host=os.getenv('DB_HOST','mysql-rfam-public.ebi.ac.uk'),
    port=int(4497),
    user=os.getenv('DB_USER','rfamro'),
    passwd=os.getenv('DB_PASSWORD',''),
    db=os.getenv('DB_NAME','Rfam'),
    charset='utf8mb4')

df = pd.read_sql_query("select rfam_acc,rfam_id,auto_wiki,description,author,seed_source FROM family",
    conn) 

df.tail(10)

Unnamed: 0,rfam_acc,rfam_id,auto_wiki,description,author,seed_source
2781,RF02880,MH_s15,2551,Mesorhizobail RNA 15,Argasinska J,Argasinska J
2782,RF02881,MH_s25,2551,Mesorhizobail RNA 25,Argasinska J,Argasinska J
2783,RF02882,MH_s36,2551,Mesorhizobail RNA 36,Argasinska J,Argasinska J
2784,RF02883,BcKCsr2,2451,Burkholderia sRNA 2,Argasinska J,Argasinska J
2785,RF02884,BcKCsr7,2451,Burkholderia sRNA 7,Argasinska J,Argasinska J
2786,RF02885,SAM_VI,2552,SAM-VI riboswitch,Argasinska J,Argasinska J
2787,RF02886,npcTB_6715,2178,Mycobacterium sRNA 6715,Argasinska J,Argasinska J
2788,RF02887,mgtC_leader,2553,Salmonella mgtC leader RNA,Argasinska J,Argasinska J
2789,RF02888,BtsR1,2554,Bacillus sRNA 1,Argasinska J,Argasinska J
2790,RF02889,Sr006,2555,Pseudomonas sRNA 6,Argasinska J,Argasinska J


## Writing the results to iguazio Key/Value Database
The following section demonstrate establishing a connection with iguazio high-performance DataFrames service (v3io_frames) and writing the data from the SQL database<br>
iguazio database support multiple models (KV/NoSQL, time-series, stream, object) those are specified in the first argument, read more in: `TBD Frames link`

In [7]:
import pandas as pd
import v3io_frames as v3f
import os
client = v3f.Client('v3io-framesd:8081', container='users',password=os.getenv('V3IO_PASSWORD'))
tablename = 'iguazio/examples/family'

Ingesting data into the database using NoSQL API

In [8]:
client.write('kv', tablename, df)

## Using Pandas streaming capabilities to copy large datasets 
Many pandas inputs/outputs including SQL, CSV, and iguazio support chunking. with chunking the driver forms a continious iterator and data is read/written chunk by chunk.
a user specify the `chunksize` (number of rows) which will return a dataframe iterator, this iterator can be passed as is to a dataframe writer like iguazio frames.
The following example will stream data from MySQL to iguazio NoSQL API.

In [10]:
tablename2 = 'iguazio/examples/family2'
df_iter = pd.read_sql("select rfam_acc,rfam_id,auto_wiki,description,author,seed_source  FROM family", conn, chunksize=1000)
client.write('kv', tablename2, df_iter)

## Remove Data

In [11]:
client.delete('kv', tablename)
client.delete('kv', tablename2)