# Mastering SQL Basics: How to Extract Data Using SELECT, FROM, and WHERE Commands

### Introduction to SQL and relational databases

Structured Query Language (SQL) is a programming language used to manage and manipulate data in a relational database. A relational database is a collection of tables that are related to each other through common columns or fields. Each table consists of rows and columns where rows represent individual records and columns represent attributes or fields of that record.

SQL is used to retrieve, update, and delete data from a database. SQL commands can be used to create tables, insert data, update data, and delete data. In this tutorial, we'll focus on using SQL to retrieve data from a database.

I'll begin  by demonstrating how to create a DataFrame with random data using Python and copy it to a table in a SQLite database. This practical example will help you understand how to apply the concepts of SQL to real-world situations.

Please don't be put off by the following python code. It's just a few lines of code to create a DataFrame and copy it to a SQLite database. You don't need to understand the code to follow the rest of the tutorial.

In [1]:
import pandas as pd 
import numpy as np
import sqlite3
import sqlalchemy
import random
from datetime import datetime, timedelta
import os

In [2]:
# Define the products and regions
products = ['Product A', 'Product B', 'Product C', 'Product D', 'Product E']
regions = ['Region 1', 'Region 2', 'Region 3', 'Region 4', 'Region 5']

# Set the random seed
random.seed(123)

# Generate random sales data
sales_data = []
start_date = datetime(2022, 1, 1)
end_date = datetime(2022, 12, 31)
for i in range(1000):
    product = random.choice(products)
    region = random.choice(regions)
    sale_date = start_date + timedelta(days=random.randint(0, (end_date - start_date).days))
    sales = np.round(random.uniform(1000, 10000), 2)
    sales_data.append([product, region, sale_date, sales])

# Convert the sales data to a Pandas DataFrame
columns = ['Product', 'Region', 'Sale Date', 'Sales']
df = pd.DataFrame(sales_data, columns=columns)

# Save the DataFrame to a CSV file
df.to_csv('sales_data.csv', index=False)


In [3]:
# Define the name of the database and the name of the table
database = 'data/sales.db'
table = 'sales_data'

# Create the data folder if it doesn't exist
if not os.path.exists('data'):
    os.makedirs('data')

# Connect to the database
conn = sqlite3.connect(database)

# Check if the table already exists
cursor = conn.cursor()
cursor.execute(f"SELECT name FROM sqlite_master WHERE type='table' AND name='{table}'")
table_exists = cursor.fetchone() is not None

# If the table doesn't exist, create it
if not table_exists:
    df.to_sql(table, conn, index=False)

# If the table already exists, append the new data to it
else:
    df.to_sql(table, conn, index=False, if_exists='append')

# Close the connection
conn.close()


In [4]:
%load_ext sql

This command %load_ext sql is a Jupyter magic command that loads an SQL extension into the notebook.

In [5]:
%sql sqlite:///data/sales.db

'Connected: @data/sales.db'

The command %sql sqlite:///data/orders.db sets up a connection to the SQLite database located at data/orders.db.

The sqlite:/// prefix indicates that we are using the SQLite database engine, and the path data/orders.db specifies the location of the database file on disk.

By running this command, we establish a connection to the SQLite database within our Jupyter notebook environment. Once the connection is established, we can execute SQL commands directly in code cells using the %sql prefix, allowing us to query and manipulate the data in the database as needed.

Overall, using the %sql sqlite:///data/orders.db command sets up a connection to an SQLite database within a Jupyter notebook, providing a convenient way to interact with the data using SQL commands.

### Using SELECT to retrieve data from a table

In SQL, the SELECT and FROM commands are two essential parts of a query that enable you to retrieve data from a database. The SELECT command specifies the columns that you want to retrieve data from, while the FROM command identifies the table or tables that you want to retrieve the data from. By combining these commands, you can create complex queries that filter, sort, and join data from multiple tables to get the desired results. The SELECT and FROM commands are the foundation of SQL queries and are used in almost every query that you write.

The basic syntax of a SELECT statement is as follows:

```sql
SELECT column1, column2, ...
FROM table_name;
```

In the example bellow I'll also use the LIMIT clause to limit the number of rows returned by the query as it would otherwise return all 1000 rows in the table.

In [6]:
%%sql

SELECT Product, Region, Sales
FROM sales_data
LIMIT 5;

 * sqlite:///data/sales.db
Done.


Product,Region,Sales
Product A,Region 3,7920.61
Product C,Region 1,4412.12
Product E,Region 3,8668.78
Product B,Region 2,6048.66
Product B,Region 2,9153.15
