In [None]:
import pandas as pd
from sqlalchemy import create_engine, Table, MetaData, select, and_, desc, func

In [None]:
pd.set_option('max_columns', 200)
pd.set_option('max_rows', 300)
pd.set_option('display.expand_frame_repr', True)

### Data Files Location

* Most data files for the exercises can be found on the [course site](#https://www.datacamp.com/courses/introduction-to-relational-databases-in-python)
    * [Census (CSV)](#https://assets.datacamp.com/production/repositories/274/datasets/7a5a4567430ee737c70994d1c4747f252e0fd527/census.csv)
    * [Census (SQLite)](#https://assets.datacamp.com/production/repositories/274/datasets/f6eda83e7fb90ac06a22af4132a355933763785c/census.sqlite)
    * [Employees (SQLite)](#https://assets.datacamp.com/production/repositories/274/datasets/af705f788c225cad7e6ef405ed5490db36ed03bf/employees.sqlite)    
* Other data files may be found in my [DataCamp repository](#https://github.com/trenton3983/DataCamp/tree/master/data)

### Data File Objects

In [None]:
census_csv_data = 'data/intro_to_databases_in_python/census.csv'
census_sql_data = 'sqlite:///data/intro_to_databases_in_python/census.sqlite'
employees_sql_data = 'sqlite:///data/intro_to_databases_in_python/employees.sqlite'

# Introduction to Databases in Python

***Course Description***

In this Python SQL course, you'll learn the basics of using Structured Query Language (SQL) with Python. This will be useful since whether you like it or not, databases are ubiquitous and, as a data scientist, you'll need to interact with them constantly. The Python SQL toolkit SQLAlchemy provides an accessible and intuitive way to query, build & write to SQLite, MySQL and Postgresql databases (among many others), all of which you will encounter in the daily life of a data scientist.

## 1: Basics of Relational Databases

In this chapter, you will become acquainted with the fundamentals of Relational Databases and the Relational Model. You will learn how to connect to a database and then interact with it by writing basic SQL queries, both in raw SQL as well as with SQLAlchemy, which provides a Pythonic way of interacting with databases.

### 1.a: Introduction to Databases

#### A database consists of tables

![alt text](https://github.com/trenton3983/DataCamp/blob/master/Images/intro_to_databases_in_python/tables.JPG?raw=true "Tables")

#### Table consists of columns and rows

![alt text](https://github.com/trenton3983/DataCamp/blob/master/Images/intro_to_databases_in_python/columns_rows.JPG?raw=true "Columns and Rows")

#### Tables can be related

![alt text](https://github.com/trenton3983/DataCamp/blob/master/Images/intro_to_databases_in_python/related.JPG?raw=true "Related")

### Exercises

#### Relational Model

Which of the following is not part of the relational model?

Answer the question

1. Tables
2. Columns
3. Rows
4. __Dimensions__
5. Relationships


### 1.b: Connecting to a Database

#### Meet SQLAlchemy

* Two Main Pieces
    * Core (Relational Model focused)
    * ORM (User Data Model focused)
        * Object Relational Model

#### There are many types of databases

* SQLite
* PostgreSQL
* MySQL
* MS SQL
* Oracle
* Many more

#### Connecting to a database

```python
In [1]: from sqlalchemy import create_engine
In [2]: engine = create_engine('sqlite:///census_nyc.sqlite')
In [3]: connection = engine.connect()
```

* Engine: common interface to the database from SQLAlchemy
* Connection string: All the details required to find the database (and login, if necessary)

#### A word on connection strings
* 'sqlite:///census_nyc.sqlite'
* Driver+Dialect Filename

#### What’s in your database?

* Before querying your database, you’ll want to know what is in it: what the tables are, for example:

```python
In [1]: from sqlalchemy import create_engine
In [2]: engine = create_engine('sqlite:///census_nyc.sqlite')
In [3]: print(engine.table_names())
Out[3]: ['census', 'state_fact']
```

#### Reflection

* Reflection reads database and builds SQLAlchemy Table objects

```python
In [1]: from sqlalchemy import MetaData, Table
In [2]: metadata = MetaData()
In [3]: census = Table('census', metadata, autoload=True, autoload_with=engine)
In [4]: print(repr(census))
Out[4]:
Table('census', MetaData(bind=None), Column('state',
VARCHAR(length=30), table=<census>), Column('sex',
VARCHAR(length=1), table=<census>), Column('age', INTEGER(),
table=<census>), Column('pop2000', INTEGER(), table=<census>),
Column('pop2008', INTEGER(), table=<census>), schema=None)
```

### Exercises

#### Engines and Connection Strings

Alright, it's time to create your first engine! An engine is just a common interface to a database, and the information it requires to connect to one is contained in a connection string, such as **sqlite:///census_nyc.sqlite**. Here, **sqlite** is the database driver, while **census_nyc.sqlite** is a SQLite file contained in the local directory.

You can learn a lot more about connection strings in the [SQLAlchemy documentation](#http://docs.sqlalchemy.org/en/latest/core/engines.html#database-urls).

Your job in this exercise is to create an engine that connects to a local SQLite file named **census.sqlite**. Then, print the names of the tables it contains using the **.table_names()** method. Note that when you just want to print the table names, you do not need to use **engine.connect()** after creating the engine.

**Instructions**

* Import **create_engine** from the **sqlalchemy** module.
* Using the **create_engine()** function, create an engine for a local file named **census.sqlite** with **sqlite** as the driver. Be sure to enclose the connection string within quotation marks.
* Print the output from the **.table_names()** method on the **engine**.

In [None]:
# Import create_engine - at top of notebook

# Create an engine that connects to the census.sqlite file: engine
engine = create_engine(census_sql_data)

# Print table names
engine.table_names()

#### Autoloading Tables from a Database

SQLAlchemy can be used to automatically load tables from a database using something called reflection. Reflection is the process of reading the database and building the metadata based on that information. It's the opposite of creating a Table by hand and is very useful for working with existing databases. To perform reflection, you need to import the **Table** object from the SQLAlchemy package. Then, you use this **Table** object to read your table from the engine and autoload the columns. Using the **Table** object in this manner is a lot like passing arguments to a function. For example, to autoload the columns with the engine, you have to specify the keyword arguments **autoload=True** and **autoload_with=engine** to **Table()**.

In this exercise, your job is to reflect the **census** table available on your **engine** into a variable called **census**. The metadata has already been loaded for you using **MetaData()** and is available in the variable **metadata**.

**Instructions**

* Import the **Table** object from **sqlalchemy**.
* Reflect the **census** table by using the Table object with the arguments:
    * The name of the table as a string (**'census'**).
    * The metadata, contained in the variable **metadata**.
    * **autoload=True**
    * The engine to autoload with - in this case, **engine**.
* Print the details of **census** using the **repr()** function.

In [None]:
# Import Table - at top of Notebook

# Reflect census table from the engine: census
metadata = MetaData()
census = Table('census', metadata, autoload=True, autoload_with=engine)

# Print census table metadata
repr(census)

#### Viewing Table Details

Great job reflecting the **census** table! Now you can begin to learn more about the columns and structure of your table. It is important to get an understanding of your database by examining the column names. This can be done by using the **.columns** attribute and accessing the **.keys()** method. For example, **census.columns.keys()** would return a list of column names of the **census** table.

Following this, we can use the metadata container to find out more details about the reflected table such as the columns and their types. For example, table objects are stored in the **metadata.tables** dictionary, so you can get the metadata of your **census** table with **metadata.tables['census']**. This is similar to your use of the **repr()** function on the census table from the previous exercise.

**Instructions**

* Reflect the **census** table as you did in the previous exercise using the **Table()** function.
* Print a list of column names of the **census** table by applying the **.keys()** method to **census.columns**.
* Print the details of the **census** table using the **metadata.tables** dictionary along with the **repr()** function. To do this, first access the **'census'** key of the **metadata.tables** dictionary, and place this inside the provided **repr()** function.

In [None]:
# Reflect the census table from the engine: census
census = Table('census', metadata, autoload=True, autoload_with=engine)

# Print the column names
census.columns.keys()

In [None]:
# Print full table metadata
repr(metadata.tables['census'])

### 1.c: Introduction to SQL

#### SQL Statements

* Select, Insert, Update & Delete data
* Create & Alter data

#### Basic SQL querying

```sql
● SELECT column_name FROM table_name
● SELECT pop2008 FROM People
● SELECT * FROM People
```

#### Basic SQL querying

```python
In [1]: from sqlalchemy import create_engine
In [2]: engine = create_engine('sqlite:///census_nyc.sqlite')
In [3]: connection = engine.connect()
In [4]: stmt = 'SELECT * FROM people'
In [5]: result_proxy = connection.execute(stmt)
In [6]: results = result_proxy.fetchall()
```

#### ResultProxy vs ResultSet

```python
In [5]: result_proxy = connection.execute(stmt)
In [6]: results = result_proxy.fetchall()
```

* ResultProxy
* ResultSet

#### Handling ResultSets

```python
In [1]: first_row = results[0]
In [2]: print(first_row)
Out[2]: ('Illinois', 'M', 0, 89600, 95012)
In [4]: print(first_row.keys())
Out[4]: ['state', 'sex', 'age', 'pop2000', 'pop2008']
In [6]: print(first_row.state)
Out[6]: 'Illinois'
```

#### SQLAlchemy to Build Queries

* Provides a **<font color=purple><u>Pythonic</u></font>** way to build SQL statements
* Hides differences between backend database types

#### SQLAlchemy querying

```python
In [4]: from sqlalchemy import Table, MetaData
In [5]: metadata = MetaData()
In [6]: census = Table('census', metadata, autoload=True, autoload_with=engine)
In [7]: stmt = select([census])
In [8]: results = connection.execute(stmt).fetchall()
```

#### SQLAlchemy Select Statement

* Requires a list of one or more Tables or Columns
* Using a table will select all the columns in it

```python
In [9]: stmt = select([census])
In [10]: print(stmt)
Out[10]: 'SELECT * from CENSUS'
```

### Exercises

#### Selecting data from a Table: raw SQL

Using what we just learned about SQL and applying the **.execute()** method on our connection, we can leverage a raw SQL query to query all the records in our **census** table. The object returned by the **.execute()** method is a **ResultProxy**. On this ResultProxy, we can then use the **.fetchall()** method to get our results - that is, the **ResultSet**.

In this exercise, you'll use a traditional SQL query. In the next exercise, you'll move to SQLAlchemy and begin to understand its advantages. Go for it!

**Instructions**

* Build a SQL statement to query all the columns from **census** and store it in **stmt**. Note that your SQL statement must be a string.
* Use the **.execute()** and **.fetchall()** methods on **connection** and store the result in **results**. Remember that **.execute()** comes before **.fetchall()** and that **stmt** needs to be passed to **.execute()**.
* Print **results**.

In [None]:
engine = create_engine(census_sql_data)
connection = engine.connect()

In [None]:
# Build select statement for census table: stmt
stmt = 'SELECT * FROM census'

# Execute the statement and fetch the results: results
results = connection.execute(stmt).fetchall()
results[:5]

#### Selecting data from a Table with SQLAlchemy

It's now time to build your first select statement using SQLAlchemy. SQLAlchemy provides a nice "Pythonic" way of interacting with databases. So rather than dealing with the differences between specific dialects of traditional SQL such as MySQL or PostgreSQL, you can leverage the Pythonic framework of SQLAlchemy to streamline your workflow and more efficiently query your data. For this reason, it is worth learning even if you may already be familiar with traditional SQL.

In this exercise, you'll once again build a statement to query all records from the **census** table. This time, however, you'll make use of the **select()** function of the **sqlalchemy** module. This function requires a list of tables or columns as the only required argument.

**Table** and **MetaData** have already been imported. The **metadata** is available as metadata and the connection to the database as **connection**.

**Instructions**

* Import **select** from the **sqlalchemy** module.
* Reflect the **census** table. This code is already written for you.
* Create a query using the **select()** function to retrieve the **census** table. To do so, pass a list to **select()** containing a single element: **census**.
* Print **stmt** to see the actual SQL query being created. This code has been written for you.
* Using the provided **print()** function, print all the records from the **census** table. To do this:
    * Use the **.execute()** method on **connection** with **stmt** as the argument to retrieve the ResultProxy.
    * Use **.fetchall()** on **connection.execute(stmt)** to retrieve the ResultSet.

In [None]:
# Import select - at top of Notebook

engine = create_engine(census_sql_data)
connection = engine.connect()
metadata = MetaData()

# Reflect census table via engine: census
census = Table('census', metadata, autoload=True, autoload_with=engine)

# Build select statement for census table: stmt
stmt = select([census])

# Print the emitted statement to see the SQL emitted
print(stmt)

In [None]:
# Execute the statement and print the results
results = connection.execute(stmt).fetchall()
results[:5]

#### Handling a ResultSet

Recall the differences between a ResultProxy and a ResultSet:

* ResultProxy: The object returned by the **.execute()** method. It can be used in a variety of ways to get the data returned by the query.
* ResultSet: The actual data asked for in the query when using a fetch method such as **.fetchall()** on a ResultProxy.

This separation between the ResultSet and ResultProxy allows us to fetch as much or as little data as we desire.

Once we have a ResultSet, we can use Python to access all the data within it by column name and by list style indexes. For example, you can get the first row of the results by using **results[0]**. With that first row then assigned to a variable **first_row**, you can get data from the first column by either using **first_row[0]** or by column name such as **first_row['column_name']**. You'll now practice exactly this using the ResultSet you obtained from the **census** table in the previous exercise. It is stored in the variable **results**. Enjoy!

**Instructions**

* Extract the first row of **results** and assign it to the variable **first_row**.
* Print the value of the first column in **first_row**.
* Print the value of the **'state'** column in **first_row**.

In [None]:
# Get the first row of the results by using an index: first_row
first_row = results[0]

# Print the first row of the results
print(first_row)

# Print the first column of the first row by using an index
print(first_row[0])

# Print the 'state' column of the first row by using its name
print(first_row['state'])

#### Coming up Next...

* Beef up your SQL querying skills
* Learn how to extract all types of useful information from your databases using SQLAlchemy
* Learn how to crete and write to relational databases
* Deep dive into the US census dataset

## 2: Applying Filtering, Ordering and Grouping to Queries

In this chapter, you will build on the database knowledge you began acquiring in the previous chapter by writing more nuanced queries that allow you to filter, order, and count your data, all within the Pythonic framework provided by SQLAlchemy!

### 2.a: Filtering and Targeted Data

#### Where Clauses

```python
In [1]: stmt = select([census])
In [2]: stmt = stmt.where(census.columns.state ==
'California')
In [3]: results = connection.execute(stmt).fetchall()
In [4]: for result in results:
...: print(result.state, result.age)
Out[4]:
California 0
California 1
California 2
California 3
California 4
Calif
```

* Restrict data returned by a query based on boolean conditions
* Compare a column against a value or another column
* Often used comparisons: '==', '<=', '>=', or '!='

#### Expressions

* Provide more complex conditions than simple operators
* Eg. in_(), like(), between()
* Many more in documentation
* Available as method on a Column

```python
In [1]: stmt = select([census])
In [2]: stmt = stmt.where(census.columns.state.startswith('New'))
In [3]: for result in connection.execute(stmt):
            print(result.state, result.pop2000)
Out[3]:
New Jersey 56983
New Jersey 56686
New Jersey 57011
...
```

#### Conjunctions

* Allow us to have multiple criteria in a where clause
* Eg. and_(), not_(), or_()

```python
In [1]: from sqlalchemy import or_
In [2]: stmt = select([census])
In [3]: stmt = stmt.where(or_(census.columns.state == 'California',
                              census.columns.state == 'New York'))
In [4]: for result in connection.execute(stmt):
            print(result.state, result.sex)
Out[4]:
New York M
…
California F
```

### Exercises

#### onnecting to a PostgreSQL Database

In these exercises, you will be working with real databases hosted on the cloud via Amazon Web Services (AWS)!

Let's begin by connecting to a PostgreSQL database. When connecting to a PostgreSQL database, many prefer to use the psycopg2 database driver as it supports practically all of PostgreSQL's features efficiently and is the standard dialect for PostgreSQL in SQLAlchemy.

You might recall from Chapter 1 that we use the **create_engine()** function and a connection string to connect to a database.

There are three components to the connection string in this exercise: the dialect and driver (**'postgresql+psycopg2://'**), followed by the username and password (**'student:datacamp'**), followed by the host and port **('@postgresql.csrrinzqubik.us-east-1.rds.amazonaws.com:5432/')**, and finally, the database name (**'census'**). You will have to pass this string as an argument to **create_engine()** in order to connect to the database.

**Instructions**

* Import **create_engine** from **sqlalchemy**.
* Create an engine to the **census** database by concatenating the following strings:
    * **'postgresql+psycopg2://'**
    * **'student:datacamp'**
    * '@postgresql.csrrinzqubik.us-east-1.rds.amazonaws.com'
    * **':5432/census'**
* Use the **.table_names()** method on **engine** to print the table names.

In [None]:
# Use with local file
engine = create_engine(census_sql_data)
connection = engine.connect()
metadata = MetaData()
census = Table('census', metadata, autoload=True, autoload_with=engine)

In [None]:
# Create an engine to the census database - exercise
# engine = create_engine('postgresql+psycopg2://student:datacamp@postgresql.csrrinzqubik.us-east-1.rds.amazonaws.com:5432/census')

# Use the .table_names() method on the engine to print the table names
print(engine.table_names())

#### Filter data selected from a Table - Simple

Having connected to the database, it's now time to practice filtering your queries!

As mentioned in the video, a **where()** clause is used to filter the data that a statement returns. For example, to select all the records from the **census** table where the sex is Female (or **'F'**) we would do the following:

**select([census]).where(census.columns.sex == 'F')**

In addition to **==** we can use basically any python comparison operator (such as **<=**, **!=**, etc) in the **where()** clause.

**Instructions**

* Select all records from the census table by passing in **census** as a list to **select()**.
* Append a where clause to **stmt** to return only the records with a **state** of **'New York'**.
* Execute the statement **stmt** using **.execute()** and retrieve the results using **.fetchall()**.
* Iterate over **results** and print the **age**, **sex** and **pop2008** columns from each record. For example, you can print out the **age** of **result** with **result.age**.

In [None]:
# Create a select query: stmt
stmt = select([census])

# Add a where clause to filter the results to only those for New York
stmt = stmt.where(census.columns.state == 'New York')

# Execute the query to retrieve all the data returned: results
results = connection.execute(stmt).fetchall()

# Loop over the results and print the age, sex, and pop2008
for i, result in enumerate(results):
    if i < 7:
        print(result.age, result.sex, result.pop2008)

In [None]:
results[:7]

#### Filter data selected from a Table - Expressions

In addition to standard Python comparators, we can also use methods such as **in_()** to create more powerful **where()** clauses. You can see a full list of expressions in the [SQLAlchemy Documentation](#http://docs.sqlalchemy.org/en/latest/core/sqlelement.html#module-sqlalchemy.sql.expression).

We've already created a list of some of the most densely populated states.

**Instructions**

* Select all records from the **census** table by passing it in as a list to **select()**.
* Append a where clause to return all the records with a **state** in the **states** list. Use **in_(states)** on **census.columns.state** to do this.
* Loop over the ResultProxy **connection.execute(stmt)** and print the **state** and **pop2000** columns from each record.

In [None]:
In [2]: stmt = stmt.where(census.columns.state.startswith('New'))
In [3]: for result in connection.execute(stmt):
            print(result.state, result.pop2000)

In [None]:
states = ['New York', 'California', 'Texas']

# Create a query for the census table: stmt
stmt = select([census])

# Append a where clause to match all the states in_ the list states
stmt = stmt.where(census.columns.state.in_(states))

# Loop over the ResultProxy and print the state and its population in 2000
for i, result in enumerate(connection.execute(stmt)):
    if i < 7:
        print(result.state, result.pop2000)

#### Filter data selected from a Table - Advanced

You're really getting the hang of this! SQLAlchemy also allows users to use conjunctions such as **and_()**, **or_()**, and **not_()** to build more complex filtering. For example, we can get a set of records for people in New York who are 21 or 37 years old with the following code:

```python
select([census]).where(and_(census.columns.state == 'New York',
                            or_(census.columns.age == 21,
                                census.columns.age == 37)))
```

**Instructions**

* Import **and_** from the **sqlalchemy** module.
* Select all records from the **census** table.
* Append a where clause to filter all the records whose **state** is **'California'**, and whose **sex** is not **'M'**.
* Iterate over the ResultProxy and print the **age** and **sex** columns from each record.

In [None]:
# Build a query for the census table: stmt
stmt = select([census])

# Append a where clause to select only non-male records from California using and_
# The state of California with a non-male sex
stmt = stmt.where(and_(census.columns.state == 'California',
                       census.columns.sex != 'M'))

# Loop over the ResultProxy printing the age and sex
for i, result in enumerate(connection.execute(stmt)):
    if i < 7:
        print(result.age, result.sex)

### 2.b: Ordering Query Results

#### Order by Clauses

* Allows us to control the order in which records are returned in the query results
* Available as a method on statements order_by()

```python
In [1]: print(results[:10])
Out[1]: [('Illinois',), …]
In [3]: stmt = select([census.columns.state])
In [4]: stmt = stmt.order_by(census.columns.state)
In [5]: results = connection.execute(stmt).fetchall()
In [6]: print(results[:10])
Out[6]: [('Alabama',), …]
```

#### Order by Descending

* Wrap the column with desc() in the order_by() clause

#### Order by Multiple

* Just separate multiple columns with a comma
* Orders completely by the first column
* Then if there are duplicates in the first column, orders by the second column
* repeat until all columns are ordered

```python
In [6]: print(results)
Out[6]: ('Alabama', 'M')
In [7]: stmt = select([census.columns.state, census.columns.sex])
In [8]: stmt = stmt.order_by(census.columns.state, census.columns.sex)
In [9]: results = connection.execute(stmt).first()
In [10]: print(results)
Out[10]:('Alabama', 'F')
('Alabama', 'F')
…
('Alabama', 'M')
```

### Exercises

In [None]:
# Use with local file
engine = create_engine(census_sql_data)
connection = engine.connect()
metadata = MetaData()
census = Table('census', metadata, autoload=True, autoload_with=engine)

#### Ordering by a Single Column

To sort the result output by a field, we use the **.order_by()** method. By default, the **.order_by()** method sorts from lowest to highest on the supplied column. You just have to pass in the name of the column you want sorted to **.order_by()**.

In the video, for example, Jason used **stmt.order_by(census.columns.state)** to sort the result output by the **state** column.

**Instructions**

* Select all records of the **state** column from the **census** table. To do this, pass **census.columns.state** as a list to **select()**.
* Append an **.order_by()** to sort the result output by the **state** column.
* Execute **stmt** using the **.execute()** method on **connection** and retrieve all the results using **.fetchall()**.
* Print the first 10 rows of **results**.

In [None]:
# Build a query to select the state column: stmt
stmt = select([census.columns.state])

# Order stmt by the state column
stmt = stmt.order_by(census.columns.state)

# Execute the query and store the results: results
results = connection.execute(stmt).fetchall()

# Print the first 10 results
results[:10]

#### Ordering in Descending Order by a Single Column

You can also use **.order_by()** to sort from highest to lowest by wrapping a column in the **desc()** function. Although you haven't seen this function in action, it generalizes what you have already learned.

Pass **desc()** (for "descending") inside an **.order_by()** with the name of the column you want to sort by. For instance, **stmt.order_by(desc(table.columns.column_name))** sorts **column_name** in descending order.

**Instructions**

* Import **desc** from the **sqlalchemy** module.
* Select all records of the **state** column from the **census** table.
* Append an **.order_by()** to sort the result output by the **state** column in **descending** order. Save the result as **rev_stmt**.
* Execute **rev_stmt** using **connection.execute()** and fetch all the results with **.fetchall()**. Save them as **rev_results**.
* Print the first 10 rows of **rev_results**.

In [None]:
# Build a query to select the state column: stmt
stmt = select([census.columns.state])

# Order stmt by state in descending order: rev_stmt
rev_stmt = stmt.order_by(desc(census.columns.state))

# Execute the query and store the results: rev_results
rev_results = connection.execute(rev_stmt).fetchall()

# Print the first 10 rev_results
rev_results[:10]

#### Ordering by Multiple Columns

We can pass multiple arguments to the **.order_by()** method to order by multiple columns. In fact, we can also sort in ascending or descending order for each individual column. Each column in the **.order_by()** method is fully sorted from left to right. This means that the first column is completely sorted, and then within each matching group of values in the first column, it's sorted by the next column in the **.order_by()** method. This process is repeated until all the columns in the **.order_by()** are sorted.

**Instructions**

* Select all records of the **state** and **age** columns from the **census** table.
* Use **.order_by()** to sort the output of the **state** column in ascending order and **age** in descending order. (NOTE: **desc** is already imported).
* Execute **stmt** using the **.execute()** method on **connection** and retrieve all the results using **.fetchall()**.
* Print the first 20 results.

In [None]:
# Build a query to select state and age: stmt
stmt = select([census.columns.state, census.columns.age])

# Append order by to ascend by state and descend by age
stmt = stmt.order_by(census.columns.state, desc(census.columns.age))

# Execute the statement and store all the records: results
results = connection.execute(stmt).fetchall()

# Print the first 20 results
results[:20]

### 2.c: Counting, Summing and Grouping Data

#### SQL Functions

* E.g. Count, Sum
* from sqlalchemy import func
* More efficient than processing in Python
* Aggregate data

#### Sum Example

```python
In [1]: from sqlalchemy import func
In [2]: stmt = select([func.sum(census.columns.pop2008)])
In [3]: results = connection.execute(stmt).scalar()
In [4]: print(results)
Out[4]: 302876613
```

#### Group by

* Allows us to group row by common values

```python
In [1]: stmt = select([census.columns.sex, func.sum(census.columns.pop2008)])
In [2]: stmt = stmt.group_by(census.columns.sex)
In [3]: results = connection.execute(stmt).fetchall()
In [4]: print(results)
Out[4]: [('F', 153959198), ('M', 148917415)]
```

* Supports multiple columns to group by with a pattern similar to order_by()
* Requires all selected columns to be grouped or aggregated by a function

#### Group by Multiple

```python
In [1]: stmt = select([census.columns.sex, census.columns.age,
                       func.sum(census.columns.pop2008)])
In [2]: stmt = stmt.group_by(census.columns.sex, census.columns.age)
In [2]: results = connection.execute(stmt).fetchall()
In [3]: print(results)
Out[3]:
[('F', 0, 2105442), ('F', 1, 2087705), ('F', 2, 2037280), ('F', 3,
2012742), ('F', 4, 2014825), ('F', 5, 1991082), ('F', 6, 1977923),
('F', 7, 2005470), ('F', 8, 1925725), …
```

#### Handling ResultSets from Functions

* SQLAlchemy auto generates “column names” for functions in the ResultSet
* The column names are often func_# such as count_1
* Replace them with the label() method

#### Using label()

```python
In [1]: print(results[0].keys())
Out[1]: ['sex', u'sum_1']
In [2]: stmt = select([census.columns.sex,
                       func.sum(census.columns.pop2008).label( 'pop2008_sum')])
In [3]: stmt = stmt.group_by(census.columns.sex)
In [4]: results = connection.execute(stmt).fetchall()
In [5]: print(results[0].keys())
Out[5]: ['sex', 'pop2008_sum']
```

### Exercises

In [None]:
# Use with local file
engine = create_engine(census_sql_data)
connection = engine.connect()
metadata = MetaData()
census = Table('census', metadata, autoload=True, autoload_with=engine)

#### Counting Distinct Data

As mentioned in the video, SQLAlchemy's **func** module provides access to built-in SQL functions that can make operations like counting and summing faster and more efficient.

In the video, Jason used **func.sum()** to get a **sum** of the **pop2008** column of **census** as shown below:

```python
select([func.sum(census.columns.pop2008)])
```

If instead you want to **count** the number of values in **pop2008**, you could use **func.count()** like this:

```python
select([func.count(census.columns.pop2008)])
```

Furthermore, if you only want to count the **distinct** values of **pop2008**, you can use the **.distinct()** method:

```python
select([func.count(census.columns.pop2008.distinct())])
```

In this exercise, you will practice using **func.count()** and **.distinct()** to get a count of the distinct number of states in **census**.

So far, you've seen **.fetchall()** and **.first()** used on a ResultProxy to get the results. The ResultProxy also has a method called **.scalar()** for getting just the value of a query that returns only one row and column.

This can be very useful when you are querying for just a count or sum.

**Instructions**

* Build a **select** statement to **count** the **distinct** values in the **state** field of **census**.
* Execute **stmt** to get the count and store the results as **distinct_state_count**.
* Print the value of **distinct_state_count**.

In [None]:
# Build a query to count the distinct states values: stmt
stmt = select([func.count(census.columns.state.distinct())])

# Execute the query and store the scalar result: distinct_state_count
distinct_state_count = connection.execute(stmt).scalar()

# Print the distinct_state_count
distinct_state_count

#### Count of Records by State

Often, we want to get a count for each record with a particular value in another column. The **.group_by()** method helps answer this type of query. You can pass a column to the **.group_by()** method and use in an aggregate function like **sum()** or **count()**. Much like the **.order_by()** method, **.group_by()** can take multiple columns as arguments.

**Instructions**

* Import **func** from **sqlalchemy**.
* Build a **select** statement to get the value of the state field and a count of the values in the **age** field, and store it as **stmt**.
* Use the **.group_by()** method to group the statement by the **state** column.
* Execute **stmt** using the **connection** to get the count and store the results as **results**.
* Print the keys/column names of the results returned using **results[0].keys()**.

In [None]:
# Build a query to select the state and count of ages by state: stmt
stmt = select([census.columns.state, func.count(census.columns.age)])

# Group stmt by state
stmt = stmt.group_by(census.columns.state)

# Execute the statement and store all the records: results
results = connection.execute(stmt).fetchall()

# Print results
results[:5]

In [None]:
# Print the keys/column names of the results returned
results[0].keys()

#### Determining the Population Sum by State

To avoid confusion with query result column names like **count_1**, we can use the **.label()** method to provide a name for the resulting column. This gets appendedto the function method we are using, and its argument is the name we want to use.

We can pair **func.sum()** with **.group_by()** to get a sum of the population by **State** and use the **label()** method to name the output.

We can also create the **func.sum()** expression before using it in the select statement. We do it the same way we would inside the select statement and store it in a variable. Then we use that variable in the select statement where the **func.sum()** would normally be.

**Instructions**

* Import **func** from **sqlalchemy**.
* Build an expression to calculate the sum of the values in the **pop2008** field labeled as **'population'**.
* Build a select statement to get the value of the **state** field and the sum of the values in **pop2008**.
* Group the statement by **state** using a **.group_by()** method.
* Execute **stmt** using the **connection** to get the count and store the results as **results**.
* Print the keys/column names of the results returned using **results[0].keys()**.

In [None]:
# Build an expression to calculate the sum of pop2008 labeled as population
pop2008_sum = func.sum(census.columns.pop2008).label('population')

# Build a query to select the state and sum of pop2008: stmt
stmt = select([census.columns.state, pop2008_sum])

# Group stmt by state
stmt = stmt.group_by(census.columns.state)

# Execute the statement and store all the records: results
results = connection.execute(stmt).fetchall()

# Print results
results[:5]

In [None]:
# Print the keys/column names of the results returned
results[0].keys()

### 2.d: SQLAlchemy and Pandas for Visualization 

#### SQLAlchemy and Pandas

* DataFrame can take a SQLAlchemy ResultSet
* Make sure to set the DataFrame columns to the ResultSet keys

#### DataFrame Example

```python
In [1]: import pandas as pd
In [2]: df = pd.DataFrame(results)
In [3]: df.columns = results[0].keys()
In [4]: print(df)
Out[4]:
sex pop2008_sum
0 F 2105442
1 F 2087705
2 F 2037280
3 F 2012742
4 F 2014825
5 F 1991082
```

#### Graphing

* We can graph just like we would normally

#### Graphing Example

```python
In [1]: import matplotlib.pyplot as plt
In [2]: df[10:20].plot.barh()
In [3]: plt.show()
```

### Exercises

#### SQLAlchemy ResultsProxy and Pandas Dataframes

We can feed a ResultProxy directly into a pandas DataFrame, which is the workhorse of many Data Scientists in PythonLand. Jason demonstrated this in the video. In this exercise, you'll follow exactly the same approach to convert a ResultProxy into a DataFrame.

**Instructions**

* Import pandas as pd.
* Create a DataFrame df using pd.DataFrame() on the ResultProxy results.
* Set the columns of the DataFrame df.columns to be the columns from the first result object results[0].keys().
* Print the DataFrame.

In [None]:
# import pandas


# Create a DataFrame from the results: df
df = ____

# Set column names
df.columns = ____

# Print the Dataframe
print(____)

#### From SQLAlchemy results to a Graph

We can also take advantage of pandas and Matplotlib to build figures of our data. Remember that data visualization is essential for both exploratory data analysis and communication of your data!

**Instructions**

* Import matplotlib.pyplot as plt.
* Create a DataFrame df using pd.DataFrame() on the provided results.
* Set the columns of the DataFrame df.columns to be the columns from the first result object results[0].keys().
* Print the DataFrame df.
* Use the plot.bar() method on df to create a bar plot of the results.
* Display the plot with plt.show().

In [None]:
# Import pyplot as plt from matplotlib


# Create a DataFrame from the results: df
df = ____

# Set Column names
df.columns = ____

# Print the DataFrame


# Plot the DataFrame



## 3: Advanced SQLAlchemy Queries

Herein, you will learn to perform advanced - and incredibly useful - queries that will enable you to interact with your data in powerful ways.

## 4: Creating and Manipulating your own Databases

In the previous chapters, you interacted with existing databases and queried them in various different ways. Now, you will learn how to build your own databases and keep them updated!

## 5: Putting it all together

Here, you will bring together all of the skills you acquired in the previous chapters to work on a real life project! From connecting to a database, to populating it, to reading and querying it, you will have a chance to apply all the key concepts you learned in this course.