# Iguazio Getting Started Example

This notebook contains code examples for performing common tasks to help you get started with the Iguazio Continous Data Platform

Follow the tutorial by running the paragraphs in order of appearance.

> **Tip:** You can also browse the files and directories that you write to the "users" container in this tutorial from the platform dashboard: in the side navigation menu, select **Data**, and then select the **users** container from the table. On the container data page, select the **Browse** tab, and then use the side directory-navigation tree to browse the directories. Selecting a file or directory in the browse table displays its metadata.


## Step 1: Load a sample CSV file from S3
Use `curl` to download a sample stock data from Iguazio public S3 bucket.<br>
This file belongs to deutsche-boerse public dataset.<br>
For additional public datasets check out (https://registry.opendata.aws/) <br>
<br>
Note that each user in the system has its own home directory (similar to linux home) that resides in a default container called users <br>
The environment variable V3IO_HOME points to the home directory of the logged in user<br>
All the notebooks examples store the data under the "examples" directory that resides under the user's home directory <br>
Iguaizo's best practice is to use the home directory of the user for keeping personal experiments and data in a private workspace <br>
However, to work on other folders and share data with other users you need to specify the exact path using the following convention /v3io/"data container name"/"path" <br>
V3io is the name of the iguazio data source library and it is being used to define iguazio as the storage layer for that read/write operation<br>


In [83]:
%%sh 
mkdir -p /v3io/${V3IO_HOME}/examples

# Download a sample stocks file from Iguazio demo bucket in S3
curl -L "iguazio-sample-data.s3.amazonaws.com/2018-03-26_BINS_XETR08.csv" > /v3io/${V3IO_HOME}/examples/stocks.csv

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  861k  100  861k    0     0   894k      0 --:--:-- --:--:-- --:--:--  894k


## Step 2: Convert the sample CSV file to a NoSQL table

Read the sample stocks.csv file that you downloaded in Step 1 into a Spark DataFrame, and write the data in NoSQL format to a new stocks_nosql table 

Note: To use the Iguazio Spark Connector, set the data-source format to "io.iguaz.v3io.spark.sql.kv". <br>
The V3IO_HOME_URL is an environment varible that points to the Home directory of the user using Spark/Hadoop  format

In [1]:
import os
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("Iguazio getting started").getOrCreate()

file_path=os.path.join(os.getenv('V3IO_HOME_URL')+'/examples')

# Read the sample stocks.csv file into a Spark DataFrame, and let Spark infer the schema of the CSV file
df = spark.read.option("header", "true").csv(os.path.join(file_path)+'/stocks.csv')

# Show the DataFrame data
df.show()

# Write the DataFrame data to a stocks_tab table under "users" container and define "ISIN" column as a key
df.write.format("io.iguaz.v3io.spark.sql.kv").mode("append").option("key", "ISIN").option("allow-overwrite-schema", "true").save(os.path.join(file_path)+'/stocks_tab/')


+------------+--------+--------------------+------------+--------+----------+----------+-----+----------+--------+--------+--------+------------+--------------+
|        ISIN|Mnemonic|        SecurityDesc|SecurityType|Currency|SecurityID|      Date| Time|StartPrice|MaxPrice|MinPrice|EndPrice|TradedVolume|NumberOfTrades|
+------------+--------+--------------------+------------+--------+----------+----------+-----+----------+--------+--------+--------+------------+--------------+
|CH0038389992|    BBZA|BB BIOTECH NAM.  ...|Common stock|     EUR|   2504244|2018-03-26|08:00|      56.4|    56.4|    56.4|    56.4|         320|             4|
|CH0038863350|    NESR|NESTLE NAM.      ...|Common stock|     EUR|   2504245|2018-03-26|08:00|     63.04|   63.06|      63|   63.06|         314|             3|
|LU0378438732|    C001|COMSTAGE-DAX UCIT...|         ETF|     EUR|   2504271|2018-03-26|08:00|    113.42|  113.42|  113.42|  113.42|         100|             1|
|LU0411075020|    DBPD|XTR.SHORTDA

## Step 3: Run interactive SQL queries

In [None]:
%sql select * from v3io.users."iguazio/examples/stocks_tab" where tradedvolume > 11000 order by tradedvolume

## Step 4: Convert the stocks_nosql table to a Parquet file

In [9]:
df.write.mode('overwrite').parquet(os.path.join(file_path)+'/stocks_prqt')


## Step 5: Display the content of the example container directory
Use hadoop fs to list the contents of the root directory under “users” container where all the example files are located
You should see in this directory the stocks.csv file and the stocks_nosql and stocks_prqt table directories.

In [10]:
!ls -lrt /v3io/${V3IO_HOME}/examples

total 0
drwxrwxrwx. 2 50 nogroup      0 Feb 24 08:04 stocks_tab
-rw-r--r--. 1 50 nogroup 882055 Feb 24 08:59 stocks.csv
drwxr-xr-x. 2 50 nogroup      0 Feb 24 09:19 stocks_prqt2
drwxr-xr-x. 2 50 nogroup      0 Feb 24 09:20 stocks_prqt


In [11]:
%%sh

# List the files and directories in the root directory of the "users" container using hadoop
hadoop fs -ls ${V3IO_HOME_URL}/examples

Found 4 items
-rw-r--r--   1 50 nogroup     882055 2019-02-24 08:59 v3io://users/iguazio/examples/stocks.csv
drwxr-xr-x   - 50 nogroup          0 2019-02-24 09:20 v3io://users/iguazio/examples/stocks_prqt
drwxr-xr-x   - 50 nogroup          0 2019-02-24 09:19 v3io://users/iguazio/examples/stocks_prqt2
drwxrwxrwx   - 50 nogroup          0 2019-02-24 08:04 v3io://users/iguazio/examples/stocks_tab


19/02/24 09:22:32 INFO slf_4j.Slf4jLogger: Slf4jLogger started


## Remove Data

In [70]:
# Delete all files under my example directory
!rm -rf /v3io/${V3IO_HOME}/examples/*

In order to release compute and memory resources taken by spark we recommend running the following command 

In [71]:
spark.stop()