# Updating and Cleaning Tables
© Explore Data Science Academy

## Learning Objectives

In this train, you will learn:
- How to modify the data entries of a SQL table;
- How to remove one or more rows in a SQL table; and
- How to delete SQL tables.

## Outline

This train is structured as follows: 

- The UPDATE Statement
    - Updating tables rows
    - Updating tables rows with a condition
- The DELETE Statement
    - Deleting table entries
    - Deleting table entries with a condition
- The DROP statement
    - Deleting tables

## Overview

So far, we have covered SQL statements for creating databases and reading data from them. However, we have yet to discuss how they can be modified. Particularly, how we can update existing information in a database table, delete rows in a database table, and delete entire database tables. 
In this train we cover the three statements can be used to achieve these outcomes.

Let's begin!


Load SQL magics and database:

In [1]:
%load_ext sql

# Load SQLite database
%sql sqlite:///chinook.db

For your convenience, the [ER (Entity Relationship) diagram](https://www.lucidchart.com/pages/er-diagrams) of the chinook database:

<img src="https://github.com/Explore-AI/Pictures/blob/master/sqlite-sample-database-color.jpg?raw=true" width=70%/>

_[Image source](https://www.sqlitetutorial.net/sqlite-sample-database/)_


## 1. The Update Statement
Databases in SQL are mutable, this means that at any time, we can update the information stored in them. For this purpose, SQL has the `UPDATE` statement which can be used to update the values in a table column. 

**Syntax:**

```SQL
UPDATE table_name 
   SET column1 = value1, 
       column2 = value2, ...
```

### 1.1. Updating tables rows 
Suppose the Chinook media company had a sale where all media items cost 50 cents. To implement this, we need to change the UnitPrice of **all items** to 0.50 in the tracks table, an ideal problem for the update statement:

In [2]:
%%sql

UPDATE tracks 
   SET UnitPrice = 0.50;

 * sqlite:///chinook.db
3503 rows affected.


[]

Note that we can update multiple columns this way, as indicated in the syntax. Let's check to see if our query worked as expected: 

In [3]:
%%sql

Select Name, UnitPrice
FROM tracks
LIMIT 10;

 * sqlite:///chinook.db
Done.


Name,UnitPrice
For Those About To Rock (We Salute You),0.5
Balls to the Wall,0.5
Fast As a Shark,0.5
Restless and Wild,0.5
Princess of the Dawn,0.5
Put The Finger On You,0.5
Let's Get It Up,0.5
Inject The Venom,0.5
Snowballed,0.5
Evil Walks,0.5


As you can see, everything now costs [50 cent](https://en.wikipedia.org/wiki/50_Cent)! 

### 1.2. Updating tables rows with condition
The `UPDATE` statement can also be used to update table rows according to a given condition. This is useful in cases where we want to update a single or multiple specific rows. The syntax for this is as follows:

```SQL
UPDATE table_name 
   SET column1 = value1, 
       column2 = value2, ...
   WHERE
       search_condition
```
Let's do an example. 

As part of the data cleaning process, we sometimes need to get rid of NULL values in a given table column. Because we don't know what values are supposed to go in their place, we take a guess. This process of guessing is formally referred to as **data imputation**. 

Let's put this in practice by **imputing** the missing values in the Composer column of the tracks table. Our strategy here is to replace all missing values by the most common composer in the tracks table, i.e., we are making the assumption that all tracks without a composer were composed by the composer with the most tracks in the table. 

1. Finding the most common composer (i.e. the mode of the Composer column):

In [4]:
%%sql

SELECT Composer, count(Composer) AS "Number of Compositions" 
FROM tracks
GROUP BY Composer
ORDER BY 2 DESC
Limit 10;

 * sqlite:///chinook.db
Done.


Composer,Number of Compositions
Steve Harris,80
U2,44
Jagger/Richards,35
Billy Corgan,31
Kurt Cobain,26
Bill Berry-Peter Buck-Mike Mills-Michael Stipe,25
The Tea Party,24
Miles Davis,23
Gilberto Gil,23
Chris Cornell,23


Our query has tells us that Steve Harris is the Composer With the most tracks (i.e. 80 tracks). 

2. Now let's replace all None values in the Composer column with Steve Harris:

In [5]:
%%sql

UPDATE tracks
       SET Composer = "Steve Harris"
     WHERE Composer IS NULL;

 * sqlite:///chinook.db
978 rows affected.


[]

We have replaced all missing values in the Composers column with "Steve Harris". We can also verify that there are no more missing values in the column as follows: 

In [6]:
%%sql

SELECT *
FROM tracks
WHERE Composer IS NULL;

 * sqlite:///chinook.db
Done.


TrackId,Name,AlbumId,MediaTypeId,GenreId,Composer,Milliseconds,Bytes,UnitPrice


Success!

## 2. The Delete Statement

SQL would not be complete without the ability to remove data from the database. The `DELETE` statement is used to permanently delete rows in a given database table. 

**Syntax (this may vary in other SQL engines):**

```SQL
DELETE FROM table_name;
```
Let's do some examples:

### 2.1. Deleting table entries
The default behaviour of the `DELETE` statement is that it will delete all the rows of the given database table. To see it in action, let's attempt to delete all entries in the playlists table. 

In [7]:
%%sql 

DELETE FROM playlists;

 * sqlite:///chinook.db
18 rows affected.


[]

Let's verify that all rows in the playlists table have indeed been deleted:

In [8]:
%%sql

SELECT *
FROM playlists;

 * sqlite:///chinook.db
Done.


PlaylistId,Name


Success! ... Well, we hope that we didn't accidentally remove something important!

### 2.2. Deleting table entries with a condition
As with the `UPDATE` statement, the `DELETE` can also be used with a condition in cases where we don't want to get rid of all the rows in a given table. The syntax for this is as follows:

```SQL
DELETE FROM table_name 
WHERE condition
```

Let's do an example:

Suppose that due to hosting platform issues, Chinook has decided to only host audio media types. Write a query that will remove all non-audio tracks from the database. 

1. Let's first take a look at the media_types table to see what media types exist:

In [9]:
%%sql

SELECT * 
FROM media_types;

 * sqlite:///chinook.db
Done.


MediaTypeId,Name
1,MPEG audio file
2,Protected AAC audio file
3,Protected MPEG-4 video file
4,Purchased AAC audio file
5,AAC audio file


The only non-audio media-type here is the "Protected MPEG-4 video file", i.e. MediaTypeId = 3. 

2. Let's remove all tracks that are Protected MPEG-4 video files:

In [10]:
%%sql

DELETE FROM tracks
WHERE MediaTypeId = 3;

 * sqlite:///chinook.db
214 rows affected.


[]

Let's verify:

In [11]:
%%sql

SELECT m.Name, count(t.MediaTypeId) AS "Number of tracks" 
FROM tracks t
LEFT JOIN media_types m
ON m.MediaTypeId = t.MediaTypeId
GROUP BY 1;

 * sqlite:///chinook.db
Done.


Name,Number of tracks
AAC audio file,11
MPEG audio file,3034
Protected AAC audio file,237
Purchased AAC audio file,7


As shown in the resulting table, all Protected MPEG-4 video tracks have been removed.

## 3. The Drop Statement
The `DROP` statement is a versatile deletion statement that can be used to remove various database or SQL elements. Unlike the `DELETE` statement which targets the rows in a table, the `DROP` statement can delete entire tables. 

**Syntax:**

```SQL
DROP TABLE table_name;
```
The `DROP` statement will produce an error in cases where we try to delete tables that are used in a [view](https://www.w3schools.com/sql/sql_view.asp) or when we try to delete tables that are referenced in the trigger actions of other tables. If none of these conditions are violated, then we are free to delete the table in question.

Let's do an example:

### 3.1. Deleting tables
Let's delete the playlists and playlist_track tables:

1. Deleting playlists:

In [12]:
%%sql

DROP TABLE playlists;

 * sqlite:///chinook.db
Done.


[]

Let's verify to see if our query worked:

In [13]:
%%sql

SELECT *
FROM playlists;

 * sqlite:///chinook.db
(sqlite3.OperationalError) no such table: playlists
[SQL: SELECT * FROM playlists;]
(Background on this error at: http://sqlalche.me/e/e3q8)


Success! SQL generates the `no such table` error.

2. Deleting playlist_track:

In [14]:
%%sql

DROP TABLE playlist_track;

 * sqlite:///chinook.db
Done.


[]

And that's it for this train!

## Conclusion
In this train we presented some of the elements required for cleaning data in a database. Particularly, how to modify the data entries of a SQL table, how to remove one or more rows in a SQL table, and finally How to delete SQL tables. We introduced the `UPDATE`, `DELETE`, and `DROP` SQL statements and provided examples highlighting their usage.

## Additional links
- [The Update statement](https://www.dofactory.com/sql/update)
- [SQL DELETE Examples](https://www.dofactory.com/sql/delete)
- [The DROP TABLE statement](https://db.apache.org/derby/docs/10.13/ref/rrefsqlj34148.html)
