# Intro To Reading Data From A Database Into A Pandas DataFrame (and how to install packages)

A common and popular data access pattern, with Pandas, is to query data from a database directly into a Pandas DataFrame.  Once the data is in the DataFrame, a user can further analyze the data. In this notebook, I give examples of how to read data from a PostgreSQL database and a MySQL database.

To read data from a database into a pandas DataFrame, use SQLAlchemy, which is installed with Anaconda, to create an Engine object which will be the "bridge" between pandas and the database. SQLAlchemy supports multiple drivers for different databases, but requires that you have the driver, you want to use, installed.

See examples in the SQLAlchemy documentation here: (http://docs.sqlalchemy.org/en/latest/core/engines.html)

Some example drivers, for different databases are as follows:

* MySQL: pymysql, mysqldb
* PostgresSQL: psycopg2, pg8000
* Microsoft SQL Server: pyodbc


## Installing the necessary driver

Before we can use SQLAlchemy, we must install the necessary driver for the database we wish to connect to. Often, these can be installed in the form of python packages (even if the package includes drivers in non-python, compiled languages. Conda takes care of these complexities for us)

There are *three* ways we can install python packages:

* We can use the anaconda-navigator
* We can use "conda install <package-name>" from the terminal or anaconda prompt
* We can use "pip install <package-name>" from the terminal or anaconda prompt

# Example: Connecting to a MySQL database using the pymysql driver

## Step 1: Import the necessary packages

In [None]:
import pandas as pd
from sqlalchemy import create_engine

# import pymysql  # driver for MySQL connections
# import psycopg2  # driver for PostgreSQL connections
# import pyodbc # driver you can use for Google Big Query, Hive, MySQL, PostgresSQL, Microsoft SQL Server, and more!

## Step 2: Form the correct database url

The format for the database url is: dialect+driver://username:password@host:port/database

the format is: dialect+driver://username:password@host:port/database

* example for MySQL: "mysql+pymysql://test:123@132.148.86.167:3306/mydatabase" 

* example for PostgreSQL: "postgresql+psycopg2://admin:secret123@145.134.99.167:3306/database" 

* example for Microsoft SQL Server: "mssql+pyodbc://scott:tiger@ms_2008/mydb"

In [None]:
db_url = "mysql+pymysql://test:123@132.148.86.167:3306/mydatabase" 

## Step 2: Create the database "engine" use create_engine from sqlalchemy

In [None]:
engine = create_engine(db_url)

## Step 3: Query the database using the read_sql() method from pandas (and pass the engine you created above)

In [None]:
sql = 'SELECT * FROM ShiftManagerApp_LaborSheet'
dataframe_with_query_results = pd.read_sql(sql, engine)

# What about other databases??

## I have yet to hear of a database that python can not connection to...

* AWS: https://aws.amazon.com/sdk-for-python/
* Oracle: https://www.oracle.com/technetwork/articles/dsl/python-091105.html
* PyHive: https://community.hortonworks.com/articles/97062/query-hive-using-python.html
* MongoDB: https://docs.mongodb.com/ecosystem/drivers/python/

## Question or Comments About This Notebook?
Feel free to contact me via my LinkedIn: https://www.linkedin.com/in/william-j-henry <br>
You can also email me at will@henryanalytics.com <br>