# Ingesting data from DB into Database


In this notebook we will use the [Versatile Data Kit (VDK)](https://github.com/vmware/versatile-data-kit) to develop an ingestion Data Job. This job will read data from one local SQLite database and write it into another local SQLite database, thus creating a backup for a table.

<a name="prerequisites"></a>
## 1. Prerequisites

### 1.1 Good to Know Before Your Start

This tutorial can be easily understood if you are familiar with:

- **Python and SQL**: Basic commands and queries
- **Tools**: Comfort with command line and Jupyter Notebook

### 1.2 Useful notebook shortcuts

* Click the **Play icon** in the left gutter of the cell;
* Type **Cmd/Ctrl+Enter** to run the cell in place;
* Type **Shift+Enter** to run the cell and move focus to the next cell (adding one if none exists); or
* Type **Alt+Enter** to run the cell and insert a new code cell immediately below it.

There are additional options for running some or all cells in the **Runtime** menu on top.

### 1.3 Install Versatile Data Kit and required plugins




In [35]:
!pip install quickstart-vdk vdk-notebook vdk-ipython==0.2.5 vdk-data-sources vdk-singer tap-rest-api-msdk



The relevant Data Job code is in the upcoming cells.
<br>Alternatively, you can see the implementation of the data job <a href="https://github.com/vmware/versatile-data-kit/tree/main/examples/sqlite-processing-example/sqlite-example-job">here</a>

## 2. Database

We will be using the chinook SQLite database. Here we can download it using the following commands.

In [36]:
!curl https://www.sqlitetutorial.net/wp-content/uploads/2018/03/chinook.zip >> chinook.zip
!unzip chinook.zip
!rm -r chinook.zip

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0100  298k  100  298k    0     0  1995k      0 --:--:-- --:--:-- --:--:-- 2002k
Archive:  chinook.zip
  inflating: chinook.db              


chinook.db' should now be located in the same directory where the original zip file was downloaded.

## 3. Configuration

We have previously installed Versatile Data Kit and the plugins required for the example. Now the path to the database we just downloaded must be declared as an environment variable.


In [37]:
%env VDK_SQLITE_FILE=chinook.db
%env DB_DEFAULT_TYPE=sqlite
%env INGEST_METHOD_DEFAULT=sqlite
%env INGESTER_WAIT_TO_FINISH_AFTER_EVERY_SEND=true

env: VDK_SQLITE_FILE=chinook.db
env: DB_DEFAULT_TYPE=sqlite
env: INGEST_METHOD_DEFAULT=sqlite
env: INGESTER_WAIT_TO_FINISH_AFTER_EVERY_SEND=true


vdk.plugin.ipython extension introduces a magic command for Jupyter.

The command enables the user to load VDK for the current notebook.
VDK provides the job_input API, which has methods for:
* executing queries to an OLAP database;
* ingesting data into a database;
* processing data into a database.

Type help(job_input) to see its documentation.

In [38]:
# NOTE: The CELL may fail when run the first time. Run it again and it shoud suceeds.

%reload_ext vdk.plugin.ipython
%reload_VDK
job_input = VDK.get_initialized_job_input()

2025-01-28 12:01:38,340 [VDK] [INFO ] vdk.plugin.control_cli_plugin. properties_plugin.py :30   initialize_job   - Control Service REST API URL is not configured. Will not initialize Control Service based Properties client implementation.
2025-01-28 12:01:38,346 [VDK] [INFO ] vdk.plugin.control_cli_plugin.    execution_skip.py :105  _skip_job_if_nec - Checking if job should be skipped:
2025-01-28 12:01:38,347 [VDK] [INFO ] vdk.plugin.control_cli_plugin.    execution_skip.py :106  _skip_job_if_nec - Job : content, Team : None, Log config: LOCAL, execution_id: 2fd9278f-abb2-41b5-b903-6c1762fa8cab-1738065698





## 4. Data Job

The structure of our Data Job in following cells is as follows:<br><br>
**ingest-from-db-example-job**<br>
├── 1-Drop Table<br>
├── 2-Create Table<br>
├── 3-Ingest to Table<br><br>

The purpose of our Data Job ***ingest-from-db-example-job*** to demonstrate how the user can query data from a source database and then ingest it to the target database<br><br>

Our Data Job consists of three SQL steps. Using ***%%vdksql*** cell magic command we will be running each query in our notebook.<br><br>

**Each SQL step is a separate query:**

- The first step deletes the backup table if it exists. This query only serves to make the Data Job repeatable;
- The second step creates the backup table we will be inserting data into;
-The third step makes a connection to the source database, queries data from it, and then ingests the returned data into the destination_table in the target database.

<br>
Run each of the following cells in order to observe the job in action.


### Step 1: Drop Table

In [39]:
%%vdksql
DROP TABLE IF EXISTS backup_employees;

2025-01-28 12:01:38,361 [VDK] [INFO ] vdk.plugin.sqlite.sqlite_conne sqlite_connection.py :29   new_connection   - Creating new connection against local file database located at: chinook.db
2025-01-28 12:01:38,364 [VDK] [INFO ] vdk.plugin.sqlite.sqlite_conne sqlite_connection.py :29   new_connection   - Creating new connection against local file database located at: chinook.db
2025-01-28 12:01:38,367 [VDK] [INFO ] vdk.internal.builtin_plugins.c    managed_cursor.py :201  _execute_operati - Executing query:
-- job_name: content
-- op_id: 2fd9278f-abb2-41b5-b903-6c1762fa8cab-1738065698
DROP TABLE IF EXISTS backup_employees;

2025-01-28 12:01:38,372 [VDK] [INFO ] vdk.internal.builtin_plugins.c    managed_cursor.py :103  execute          - Executing query SUCCEEDED. Query duration 00h:00m:00s


'Query statement executed successfully.'

### Step 2: Create Table

In [40]:
%%vdksql
CREATE TABLE backup_employees (
    EmployeeId INTEGER,
    LastName   NVARCHAR,
    FirstName  NVARCHAR,
    Title      NVARCHAR,
    ReportsTo  INTEGER,
    BirthDate  NVARCHAR,
    HireDate   NVARCHAR,
    Address    NVARCHAR,
    City       NVARCHAR,
    State      NVARCHAR,
    Country    NVARCHAR,
    PostalCode NVARCHAR,
    Phone      NVARCHAR,
    Fax        NVARCHAR,
    Email      NVARCHAR
);

2025-01-28 12:01:38,397 [VDK] [INFO ] vdk.internal.builtin_plugins.c    managed_cursor.py :201  _execute_operati - Executing query:
-- job_name: content
-- op_id: 2fd9278f-abb2-41b5-b903-6c1762fa8cab-1738065698
 select 1 -- Testing if connection is alive.
2025-01-28 12:01:38,406 [VDK] [INFO ] vdk.internal.builtin_plugins.c    managed_cursor.py :103  execute          - Executing query SUCCEEDED. Query duration 00h:00m:00s
2025-01-28 12:01:38,408 [VDK] [INFO ] vdk.internal.builtin_plugins.c    managed_cursor.py :201  _execute_operati - Executing query:
-- job_name: content
-- op_id: 2fd9278f-abb2-41b5-b903-6c1762fa8cab-1738065698
CREATE TABLE backup_employees (
    EmployeeId INTEGER,
    LastName   NVARCHAR,
    FirstName  NVARCHAR,
    Title      NVARCHAR,
    ReportsTo  INTEGER,
    BirthDate  NVARCHAR,
    HireDate   NVARCHAR,
    Address    NVARCHAR,
    City       NVARCHAR,
    State      NVARCHAR,
    Country    NVARCHAR,
    PostalCode NVARCHAR,
    Phone      NVARCHAR,
    Fax  

'Query statement executed successfully.'

### Step 3: Ingest to Table

In [41]:
import sqlite3

db_connection = sqlite3.connect(
        "chinook.db"
    )  # if chinook.db file is not in your current directory, replace "chinook.db" with the path to your chinook.db file
cursor = db_connection.cursor()
cursor.execute("SELECT * FROM employees")
job_input.send_tabular_data_for_ingestion(
    cursor,
    column_names=[column_info[0] for column_info in cursor.description],
    destination_table="backup_employees",
)

2025-01-28 12:01:38,457 [VDK] [INFO ] vdk.internal.builtin_plugins.i   ingester_router.py :105  send_tabular_dat - Sending tabular data for ingestion with method: sqlite and target: None
2025-01-28 12:01:40,467 [VDK] [INFO ] vdk.plugin.sqlite.ingest_to_sq  ingest_to_sqlite.py :76   ingest_payload   - Ingesting payloads for target: chinook.db; collection_id: content|2fd9278f-abb2-41b5-b903-6c1762fa8cab-1738065698
2025-01-28 12:01:40,470 [VDK] [INFO ] vdk.plugin.sqlite.sqlite_conne sqlite_connection.py :29   new_connection   - Creating new connection against local file database located at: chinook.db


## 5. Results

After running the Data Job, we can check whether the new table was populated correctly by using the **sqlite-query** command afforded to us by the **vdk-sqlite** plugin, which we can use to execute queries against the configured SQLite database without having to set up a Data Job:

In [42]:
%%vdksql
SELECT * FROM backup_employees

2025-01-28 12:01:40,561 [VDK] [INFO ] vdk.internal.builtin_plugins.c    managed_cursor.py :201  _execute_operati - Executing query:
-- job_name: content
-- op_id: 2fd9278f-abb2-41b5-b903-6c1762fa8cab-1738065698
 select 1 -- Testing if connection is alive.
2025-01-28 12:01:40,571 [VDK] [INFO ] vdk.internal.builtin_plugins.c    managed_cursor.py :103  execute          - Executing query SUCCEEDED. Query duration 00h:00m:00s
2025-01-28 12:01:40,577 [VDK] [INFO ] vdk.internal.builtin_plugins.c    managed_cursor.py :201  _execute_operati - Executing query:
-- job_name: content
-- op_id: 2fd9278f-abb2-41b5-b903-6c1762fa8cab-1738065698
SELECT * FROM backup_employees

2025-01-28 12:01:40,579 [VDK] [INFO ] vdk.internal.builtin_plugins.c    managed_cursor.py :103  execute          - Executing query SUCCEEDED. Query duration 00h:00m:00s
2025-01-28 12:01:40,585 [VDK] [INFO ] vdk.internal.builtin_plugins.c    managed_cursor.py :239  fetchall         - Fetching all results from query ...
2025-01-28 1

Unnamed: 0,EmployeeId,LastName,FirstName,Title,ReportsTo,BirthDate,HireDate,Address,City,State,Country,PostalCode,Phone,Fax,Email
0,1,Adams,Andrew,General Manager,,1962-02-18 00:00:00,2002-08-14 00:00:00,11120 Jasper Ave NW,Edmonton,AB,Canada,T5K 2N1,+1 (780) 428-9482,+1 (780) 428-3457,andrew@chinookcorp.com
1,2,Edwards,Nancy,Sales Manager,1.0,1958-12-08 00:00:00,2002-05-01 00:00:00,825 8 Ave SW,Calgary,AB,Canada,T2P 2T3,+1 (403) 262-3443,+1 (403) 262-3322,nancy@chinookcorp.com
2,3,Peacock,Jane,Sales Support Agent,2.0,1973-08-29 00:00:00,2002-04-01 00:00:00,1111 6 Ave SW,Calgary,AB,Canada,T2P 5M5,+1 (403) 262-3443,+1 (403) 262-6712,jane@chinookcorp.com
3,4,Park,Margaret,Sales Support Agent,2.0,1947-09-19 00:00:00,2003-05-03 00:00:00,683 10 Street SW,Calgary,AB,Canada,T2P 5G3,+1 (403) 263-4423,+1 (403) 263-4289,margaret@chinookcorp.com
4,5,Johnson,Steve,Sales Support Agent,2.0,1965-03-03 00:00:00,2003-10-17 00:00:00,7727B 41 Ave,Calgary,AB,Canada,T3B 1Y7,1 (780) 836-9987,1 (780) 836-9543,steve@chinookcorp.com
5,6,Mitchell,Michael,IT Manager,1.0,1973-07-01 00:00:00,2003-10-17 00:00:00,5827 Bowness Road NW,Calgary,AB,Canada,T3B 0C5,+1 (403) 246-9887,+1 (403) 246-9899,michael@chinookcorp.com
6,7,King,Robert,IT Staff,6.0,1970-05-29 00:00:00,2004-01-02 00:00:00,590 Columbia Boulevard West,Lethbridge,AB,Canada,T1K 5N8,+1 (403) 456-9986,+1 (403) 456-8485,robert@chinookcorp.com
7,8,Callahan,Laura,IT Staff,6.0,1968-01-09 00:00:00,2004-03-04 00:00:00,923 7 ST NW,Lethbridge,AB,Canada,T1H 1Y8,+1 (403) 467-3351,+1 (403) 467-8772,laura@chinookcorp.com


# Congratulations! 🎉

You've successfully completed the Ingesting data from DB into Database Guide with VDK! We hope you found this guide useful.
