# Pandas with SQL

The [`pandas.io.sql` module](http://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#sql-queries) provides a collection of query wrappers to both facilitate data retrieval and to reduce dependency on DB-specific API. It supports multiple driver library, e.g., SQLAlchemy. 

## Query
To query, use `read_sql` with SQL statement to load data into DataFrame:

In [1]:
import pandas as pd
import sqlite3
conn = sqlite3.connect("sqlalchemy_example.db")
pd.read_sql("select * from address;", conn)

Unnamed: 0,id,street_name,street_number,post_code,person_id
0,1,North Street,21,46202,1


In [2]:
from sqlalchemy import create_engine
engine = create_engine('sqlite:///sqlalchemy_example.db')
pd.read_sql("select * from address;", engine)

Unnamed: 0,id,street_name,street_number,post_code,person_id
0,1,North Street,21,46202,1


In [3]:
pd.read_sql("select * from person;", engine)

Unnamed: 0,id,name
0,1,John Doe


In [4]:
df = pd.read_sql("select * from person, address where person.id==address.person_id;", engine)
df

Unnamed: 0,id,name,id.1,street_name,street_number,post_code,person_id
0,1,John Doe,1,North Street,21,46202,1


In [5]:
print(type(df))

<class 'pandas.core.frame.DataFrame'>


## Insert and Update

Pandas API has `to_sql` function that allows you to easily insert your data into the database. But it is very slow!

In [6]:
df = pd.read_sql("select * from address;", engine)
df2 = df.append({'street_name':'Newington Road', 'street_number':'15', 
           'post_code':'12121', 'person_id':2}, ignore_index = True)
df2

Unnamed: 0,id,street_name,street_number,post_code,person_id
0,1.0,North Street,21,46202,1
1,,Newington Road,15,12121,2


In [7]:
engine = create_engine('sqlite:///sqlalchemy_example2.db')
df2.to_sql('address', con=engine, index=False, if_exists='replace')

In [8]:
engine = create_engine('sqlite:///sqlalchemy_example2.db')
pd.read_sql_query("select * from address;", engine)

Unnamed: 0,id,street_name,street_number,post_code,person_id
0,1.0,North Street,21,46202,1
1,,Newington Road,15,12121,2


In [9]:
df2

Unnamed: 0,id,street_name,street_number,post_code,person_id
0,1.0,North Street,21,46202,1
1,,Newington Road,15,12121,2


In [10]:
df2.to_dict(orient="records")

[{'id': 1.0,
  'street_name': 'North Street',
  'street_number': '21',
  'post_code': '46202',
  'person_id': 1},
 {'id': nan,
  'street_name': 'Newington Road',
  'street_number': '15',
  'post_code': '12121',
  'person_id': 2}]

In [11]:
from sqlalchemy import Column, ForeignKey, Integer, String
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import relationship, sessionmaker
from sqlalchemy import create_engine

Base = declarative_base()

class Person(Base):
    __tablename__ = 'person'
    # Here we define columns for the table person
    # Notice that each column is also a normal Python instance attribute.
    id = Column(Integer, primary_key=True)
    name = Column(String(250), nullable=False)

class Address(Base):
    __tablename__ = 'address'
    # Here we define columns for the table address.
    # Notice that each column is also a normal Python instance attribute.
    id = Column(Integer, primary_key=True)
    street_name = Column(String(250))
    street_number = Column(String(250))
    post_code = Column(String(250), nullable=False)
    person_id = Column(Integer, ForeignKey('person.id'))
    person = relationship(Person)

engine = create_engine('sqlite:///sqlalchemy_example3.db')
Base.metadata.create_all(engine)
    
Session = sessionmaker(bind=engine)
session = Session()
session.bulk_insert_mappings(Address, df2.to_dict(orient="records"))
session.commit()
session.close()

In SQLAlchemy ORM, there are bulk operations, e.g., `bulk_insert_mappings`, `bulk_update_mappings`. The purpose of these methods is to emit INSERT and UPDATE statements given dictionaries or object states with lower Python overhead. This is achieved by directly expose internal elements of the unit of work system.
(For more details https://docs.sqlalchemy.org/en/latest/orm/persistence_techniques.html#bulk-operations)

The advantage of this solution is that it is fast and it exploits ORM's advantages.

In [12]:
engine = create_engine('sqlite:///sqlalchemy_example3.db')
pd.read_sql_query("select * from address;", engine)

Unnamed: 0,id,street_name,street_number,post_code,person_id
0,1,North Street,21,46202,1
1,2,Newington Road,15,12121,2


## Delete
Pandas doesn't have any command for deleting rows. You can only overwrite the old table using `to_sql`, or drop the table and insert a new one using SQLAlchemy ORM.