# Use Spark Streaming, SQL, and ML with iguazio
Spark users can access files, tables or streams stored on iguazio data platform through the native spark Dataframe interfaces. <br>
iguazio drivers for Spark implement the data-source API and allow `predicate push down` (the queries pass to iguazio database which only return the relevant data), this allow accelerated and high-speed access from Spark to data stored in iguazio DB. for more details read [Spark API documentation]()

## loading a file from AWS S3 into iguazio file system  


In [1]:
%%sh 
mkdir -p /v3io/bigdata/examples
curl -L "deutsche-boerse-xetra-pds.s3.amazonaws.com/2018-03-26/2018-03-26_BINS_XETR07.csv" > /v3io/bigdata/examples/stocks.csv


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  975k  100  975k    0     0  5482k      0 --:--:-- --:--:-- --:--:-- 5511k


## Initiating a Spark session 

In [2]:
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("Iguazio Integration demo").getOrCreate()

## Read the csv file using Spark DF

In [3]:
df = spark.read.option("inferSchema", "true").option("header", "true").csv('v3io://bigdata/examples/stocks.csv')
df.show()

+------------+--------+--------------------+------------+--------+----------+-------------------+-----+----------+--------+--------+--------+------------+--------------+
|        ISIN|Mnemonic|        SecurityDesc|SecurityType|Currency|SecurityID|               Date| Time|StartPrice|MaxPrice|MinPrice|EndPrice|TradedVolume|NumberOfTrades|
+------------+--------+--------------------+------------+--------+----------+-------------------+-----+----------+--------+--------+--------+------------+--------------+
|AT0000A0E9W5|    SANT|S+T AG (Z.REG.MK....|Common stock|     EUR|   2504159|2018-03-26 00:00:00|07:00|     20.56|   20.56|   20.56|   20.56|        1115|             5|
|DE000A0WMPJ6|    AIXA|  AIXTRON SE NA O.N.|Common stock|     EUR|   2504428|2018-03-26 00:00:00|07:00|    17.035|   17.08|   16.92|   16.98|        2892|            11|
|DE000A0Z2XN6|     RIB|RIB SOFTWARE SE  ...|Common stock|     EUR|   2504436|2018-03-26 00:00:00|07:00|     24.02|   24.18|   23.94|   24.12|        5

## Writing the spark DF into a table in Iguazio DB

In [4]:
# specify the DB index key using the key option (note the key must be unique)
df.write.format("io.iguaz.v3io.spark.sql.kv").mode("append").option("key", "ISIN").save("v3io://bigdata/examples/stocks_tab")


## Reading a table via Spark DF

In [5]:
spark.read.format("io.iguaz.v3io.spark.sql.kv").load("v3io://bigdata/examples/stocks_tab").show()

+------------+--------+--------------------+------------+--------+----------+-------------------+-----+----------+--------+--------+--------+------------+--------------+
|        ISIN|Mnemonic|        SecurityDesc|SecurityType|Currency|SecurityID|               Date| Time|StartPrice|MaxPrice|MinPrice|EndPrice|TradedVolume|NumberOfTrades|
+------------+--------+--------------------+------------+--------+----------+-------------------+-----+----------+--------+--------+--------+------------+--------------+
|FR0011475078|    JPNH|LYX.JAP.(TOPIX)(D...|         ETF|     EUR|   2505326|2018-03-26 00:00:00|07:36|   131.745| 131.745| 131.745| 131.745|         400|             1|
|US5951121038|     MTE|MICRON TECHN. INC...|Common stock|     EUR|   2506531|2018-03-26 00:00:00|07:04|      44.5|    44.5|    44.5|    44.5|        1000|             4|
|GB0000566504|     BIL|BHP BILLITON     ...|Common stock|     EUR|   2505369|2018-03-26 00:00:00|07:10|     15.96|  15.974|   15.96|  15.974|         

# Using SQL queries (using Presto)
## Reading the stock_tab table using SQL after being written by Spark DF


In [6]:
# run only once (load SQL magic)
%load_ext sql
%config SqlMagic.autocommit=False

In [7]:
%sql select * from v3io.bigdata."/examples/stocks_tab" where tradedvolume > 20000

Done.


securitydesc,securitytype,time,isin,minprice,date,endprice,numberoftrades,mnemonic,currency,securityid,maxprice,tradedvolume,startprice
XTR.MSCI BANGL.SWAP 1CDL,ETF,07:05,LU0659579220,0.8917,2018-03-26 00:00:00.000,0.8917,1,XBAN,EUR,2506042,0.8917,37000,0.8917
COMMERZBANK ETC UNL.,ETC,07:11,DE000ETC0308,0.284,2018-03-26 00:00:00.000,0.284,1,X0D2,EUR,2506314,0.284,30000,0.284
ISHSIV-FALL.A.H.Y.C.BDDLD,ETF,07:17,IE00BYM31M36,4.4031,2018-03-26 00:00:00.000,4.4031,1,QDVQ,EUR,2505524,4.4031,195000,4.4031
E.ON SE NA O.N.,Common stock,07:02,DE000ENAG999,8.978,2018-03-26 00:00:00.000,8.98,37,EOAN,EUR,2504666,8.996,20376,8.995
AMUNDI ETF MSCI EMER.MKTS,ETF,07:04,FR0010959676,4.1296,2018-03-26 00:00:00.000,4.1296,2,AMEM,EUR,2505311,4.1296,57117,4.1296
DK EO STOXX SEL.DIVID.30,ETF,07:11,DE000ETFL078,19.924,2018-03-26 00:00:00.000,19.924,2,EL4G,EUR,2506378,19.924,40599,19.924
"STEINHOFF INT.HLDG.EO-,50",Common stock,07:02,NL0011375019,0.2535,2018-03-26 00:00:00.000,0.2555,27,SNH,EUR,2506267,0.2596,166227,0.254
COMMERZBANK AG,Common stock,07:00,DE000CBK1001,11.04,2018-03-26 00:00:00.000,11.058,17,CBK,EUR,2504665,11.09,23698,11.09
DEUTSCHE BANK AG NA O.N.,Common stock,07:00,DE0005140008,11.358,2018-03-26 00:00:00.000,11.37,46,DBK,EUR,2504888,11.39,24466,11.39
BEATE UHSE AG,Common stock,07:03,DE0007551400,0.02,2018-03-26 00:00:00.000,0.0205,4,USE,EUR,2505107,0.0205,245598,0.02


# Remove Data

In [8]:
!rm -rf /v3io/bigdata/examples/stocks*
