<a href="https://colab.research.google.com/github/Indranil0603/versatile-data-kit/blob/Indranil%2FIndranil0603%2FColab-notebook-processing-data-using-SQL-and-local-database/examples/sqlite-example-notebook/sqlite-example-notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Processing data using SQl and local database

The notebook provides a guide on how to read data from a local SQLite database, process it,and write the result to the same database using [Versatile Data Kit (VDK)](https://github.com/vmware/versatile-data-kit)


<a name="prerequisites"></a>
## 1. Prerequisites

### 1.1 Good to Know Before Your Start

This tutorial can be easily understood if you are familiar with:

- **Python and SQL**: Basic commands and queries
- **Tools**: Comfort with command line and Jupyter Notebook

### 1.2 Useful notebook shortcuts

* Click the **Play icon** in the left gutter of the cell;
* Type **Cmd/Ctrl+Enter** to run the cell in place;
* Type **Shift+Enter** to run the cell and move focus to the next cell (adding one if none exists); or
* Type **Alt+Enter** to run the cell and insert a new code cell immediately below it.

There are additional options for running some or all cells in the **Runtime** menu on top.

### 1.3 Install Versatile Data Kit and required plugins




In [None]:
!pip install vdk-ipython vdk-sqlite

The relevant Data Job code is in the upcoming cells.
<br>Alternatively, you can see the implementation of the data job <a href="https://github.com/vmware/versatile-data-kit/tree/main/examples/sqlite-processing-example/sqlite-example-job">here</a>

## 2. Database

We will be using the chinook SQLite database. Here we can download it using the following commands.

In [None]:
!curl https://www.sqlitetutorial.net/wp-content/uploads/2018/03/chinook.zip >> chinook.zip
!unzip chinook.zip
!rm -r chinook.zip

chinook.db' should now be located in the same directory where the original zip file was downloaded.

## 3. Configuration

We have previously installed Versatile Data Kit and the plugins required for the example. Now the path to the database we just downloaded must be declared as an environment variable.


In [None]:
%env VDK_SQLITE_FILE=chinook.db

To load the extension in Collab notebook run the following command

In [None]:
%reload_ext vdk.plugin.ipython

And load the VDK (Job controll object)

In [None]:
%reload_VDK

## 4. Data Job

The structure of our Data Job in following cells is as follows:<br><br>
**sqlite-example-job**<br>
├── 1-Drop Table<br>
├── 2-Create Table<br>
├── 3-Do the processing<br><br>

The purpose of our Data Job ***sqlite-example-job*** is to extract the EmployeeId and names of employees who work with customers, and the number of customers they work with, and insert them into a newly-created table called ***customer_count_per_employee***.<br><br>

Our Data Job consists of three SQL steps. Using ***%%vdksql*** cell magic command we will be running each query in our notebook.<br><br>

**Each SQL step is a separate query:**

- The first step deletes the new table if it exists. This query only serves to make the Data Job repeatable;
- The second step creates the table we will be inserting data;
- The third step performs the described processing and inserts the new data into the customer_count_per_employee table.

<br>
Run each of the following cells in order to observe the job in action.


### Step 1: Drop Table

In [None]:
%%vdksql
DROP TABLE IF EXISTS customer_count_per_employee;

### Step 2: Create Table

In [None]:
%%vdksql
CREATE TABLE customer_count_per_employee (EmployeeId, EmployeeFirstName, EmployeeLastName, CustomerCount);

### Step 3: Do the processing

In [None]:
%%vdksql
INSERT INTO customer_count_per_employee
SELECT SupportRepId, employees.FirstName, employees.LastName, COUNT(CustomerId)
FROM (customers INNER JOIN employees ON customers.SupportRepId = employees.EmployeeId)
GROUP BY SupportRepId;

## 5. Results

After running the Data Job, we can check whether the new table was populated correctly by querying the table

In [None]:
%%vdksql
SELECT * FROM customer_count_per_employee