# Subqueries in WHERE Reading

### Introduction

In this lesson, we'll see how we can use subqueries to select rows of data that meet certain conditions.  Let's get started.

### Loading our Data

Let's begin by loading data that has various telecom financial information.

In [1]:
import pandas as pd
companies_df = pd.read_csv('https://raw.githubusercontent.com/data-eng-10-21/sql-subqueries/main/telecom_companies.csv')

Then we create a connection to a new database.

In [8]:
import sqlite3
conn = sqlite3.connect('telecom.db')

And create a new table called `companies`.

In [10]:
companies_df.to_sql('companies', conn, if_exists = 'replace')

In [11]:
cursor = conn.cursor()

In [13]:
pd.read_sql('SELECT * FROM companies LIMIT 10;',conn)

Unnamed: 0,index,company_name,market_cap,stock_price
0,0,Comcast,276.76,60.3
1,1,Verizon,226.96,54.82
2,2,AT&T,195.77,27.42
3,3,T-Mobile US,167.81,134.47
4,4,Charter Communications,147.21,800.83
5,5,American Tower,136.18,299.22
6,6,China Mobile,128.13,6.26
7,7,Nippon Telegraph & Telephone,107.08,29.61
8,8,SoftBank,106.81,62.35
9,9,Deutsche Telekom,100.64,21.22


### Comparing vs Total

Selecting based on an earlier calculation.  Now let's say that we only want to select those companies who have market cap above the average of the listed comnpanies.  

We can do so with a subquery, with the following:

In [16]:
above_avg_df = pd.read_sql("""SELECT * FROM companies 
WHERE market_cap > (SELECT AVG(market_cap) as avg_market_cap FROM companies);""",conn)
above_avg_df[:2]

Unnamed: 0,index,company_name,market_cap,stock_price
0,0,Comcast,276.76,60.3
1,1,Verizon,226.96,54.82


So in the above, the subquery first finds the `average_market_cap`, and then we return just those rows where the company has a market cap above that calculated average.

If we want to find company with the lowest stock price, we could do so like so:

In [19]:
lowest_price = pd.read_sql("""SELECT company_name, MIN(stock_price) as lowest_price FROM companies """, conn)
lowest_price[:2]


Unnamed: 0,company_name,lowest_price
0,Reliance Communications,0.04


However, it's preferred to use a subquery.  This is because the above query will only return at most one row.  However, if we use a subquery, we will make sure we return *all* rows that have this lowest price. 

> Here this still only returns one company.

In [26]:
lowest_price = pd.read_sql("""SELECT company_name, stock_price 
FROM companies 
WHERE stock_price = (SELECT MIN(stock_price) FROM companies)""", conn)
lowest_price


Unnamed: 0,company_name,stock_price
0,Reliance Communications,0.04


But, if we want to return all companies within ten cents of the lowest stock price, we can do so with the following:

In [27]:
lowest_price = pd.read_sql("""SELECT company_name, stock_price 
FROM companies 
WHERE stock_price < (SELECT MIN(stock_price) FROM companies) + .1""", conn)
lowest_price


Unnamed: 0,company_name,stock_price
0,Sarana Menara Nusantara,0.09
1,Vodafone Idea,0.11
2,Reliance Communications,0.04


### Resources

[Subqueries in Select](https://www.essentialsql.com/get-ready-to-learn-sql-server-20-using-subqueries-in-the-select-statement/)

[Correlated Subqueries](https://stackoverflow.com/questions/18909388/using-partition-clause-in-the-subquery)

[Multiple Columns in Subquery](https://stackoverflow.com/questions/583954/how-can-i-select-multiple-columns-from-a-subquery-in-sql-server-that-should-ha)