# SQL - Structured Query Language

- powerful language for querying and analyzing any amount of data in the world
- commonly used for Business Intelligence so companies can make operative decisions
- construct long-running data pipelines and process data for business purposes - Spark is a great example of a technology that utilizes SQL syntax to create data pipelines that can continuously process large amounts of data and turn it into another form of useful data, either for insights or for generating new data out of existing data.

### CREATE

- Create tables.
- syntax: `CREATE TABLE helloworld (phrase TEXT)` 
- Creates a table names helloworld whitha single column named phrase with dtype text, i.e. it can store text.

- We use `.tables` an sqlite command to show list of tables.

## INSERT

- helps inserting data into table.
- syntax: `INSERT INTO helloworld VALUES ("Hello World!");`

### Count

- counts no. of values for given query.
- syntax: `SELECT COUNT(*) FROM helloworld;`
- expected output: 1

## SELECT

- we use it to query the data and get insights.
- syntax: `SELECT * FROM helloworld WHERE phrase = "Hello World!";`
- we use **WHERE** to define what specifically we are looking for.
- Selecting data is the foundation of SQL
- basic syntax:    
  SELECT column1, column2, column3...    
  FROM table1, table2, table3...    
  WHERE condition1 AND condition2... ;

- **\* means select all.**

## UPDATE

- as name suggests it updates the values/records in row.
- syntax:    
    UPDATE table_name    
    SET column1 = value1, column2 = value2, ...    
    WHERE key = value

## DELETE

- Deleting rows is very similar to updating rows, only that type of update done to the row is a delete.
- syntax: `DELETE FROM table_name WHERE column1 = value1 AND column2 = value2`

## JOIN

- Joining tables is a very powerful concept in SQL, and it is used to combine data from multiple tables into a single result.
- syntax:    
    SELECT column_name(s)    
    FROM table1    
    JOIN table2    
    ON table1.column_name = table2.column_name;
- Joining two tables effectively creates a "cartesian multiplication" result from a query with two tables, producing M * N rows in the resulting output (assuming the first table contains M rows and the second table contains N rows).
- **ON** is used to specify the column name that the two tables have in common.

- ex:           
CREATE TABLE customers (
    id INTEGER PRIMARY KEY,
    first_name TEXT,
    last_name TEXT
);
     
CREATE TABLE orders (
    id INTEGER PRIMARY KEY,
    customer_id INTEGER,
    product_name TEXT
);
       
INSERT INTO customers (first_name, last_name) VALUES
    ("John", "Doe");
       
INSERT INTO orders (customer_id, product_name) VALUES
    (last_insert_rowid(), "Coke"),
    (last_insert_rowid(), "Sprite");
      
INSERT INTO customers (first_name, last_name) VALUES
    ("Eric", "Smith");
        
INSERT INTO orders (customer_id, product_name) VALUES
    (last_insert_rowid(), "Doritos");
       
.mode column      
.headers on      
SELECT first_name, last_name, COUNT(*) AS total_orders FROM customers       
JOIN orders ON orders.customer_id = customers.id       
GROUP BY orders.customer_id;      

## GROUP BY

- Grouping is a very powerful concept in SQL, and it is used to aggregate multiple rows of data into a smaller set of results.
- syntax:    
    SELECT column_name(s)    
    FROM table_name    
    WHERE condition    
    GROUP BY column_name(s)    
    ORDER BY column_name(s);
- **GROUP BY** is used to specify the column name that we want to use to group the data.
- **ORDER BY** is used to specify the column name that we want to use to sort the data.
- **COUNT** is used to count the number of rows in each group.
- **SUM** is used to sum the values in a column in each group.
- **AVG** is used to average the values in a column in each group.
- **MIN** is used to get the minimum value of a column in each group.
- **MAX** is used to get the maximum value of a column in each group.
- **HAVING** is used to filter the groups that are returned.
- ex:      
    SELECT first_name, last_name, COUNT(*) AS total_orders FROM    
    customers    
    JOIN orders ON orders.customer_id = customers.id     
    GROUP BY orders.customer_id;    

## AGGREGATE
- Aggregate functions are functions that aggregate multiple rows of data into a single value.
- syntax:    
    SELECT aggregate_function(column_name)    
    FROM table_name    
    WHERE condition    
    GROUP BY column_name(s)    
    ORDER BY column_name(s);
- Aggregating numbers can be done using mathematical functions such as `SUM`, `COUNT`, `AVG`, `MIN`, `MAX`, etc.
- Aggregating strings is usually done using a function such as `GROUP_CONCAT` which simply concatenates the fields.




## HAVING
- HAVING clause is an essential part of a group by query, which is almost identical to the WHERE clause, but works on aggregate fields after the aggregation phase, whereas the WHERE clause filters the data set before the aggregation process.
- syntax:
    

## Rules

- While defining a table always use `Primary Key` keyword for atleast one column to create a PK that cannot be null and is unique for all records.
- `NOT NULL` is used when we need to maek a column that can't contain NULL value.
- 

#### ref: https://www.learnsqlonline.org/  