# Use Spark Streaming, SQL, and ML with iguazio
Spark users can access files, tables or streams stored on iguazio data platform through the native spark Dataframe interfaces. <br>
iguazio drivers for Spark implement the data-source API and allow `predicate push down` (the queries pass to iguazio database which only return the relevant data), this allow accelerated and high-speed access from Spark to data stored in iguazio DB. for more details read [Spark API documentation]()

## loading a file from AWS S3 into iguazio file system  


In [None]:
%%sh 
mkdir -p /v3io/bigdata/examples
curl -L "deutsche-boerse-xetra-pds.s3.amazonaws.com/2018-03-26/2018-03-26_BINS_XETR07.csv" > /v3io/bigdata/examples/stocks.csv


## Initiating a Spark session 

In [None]:
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("Iguazio Integration demo").getOrCreate()

## Read the csv file using Spark DF

In [9]:
df = spark.read.option("inferSchema", "true").option("header", "true").csv('v3io://bigdata/examples/stocks.csv')
df.show()

+------------+--------+--------------------+------------+--------+----------+-------------------+-----+----------+--------+--------+--------+------------+--------------+
|        ISIN|Mnemonic|        SecurityDesc|SecurityType|Currency|SecurityID|               Date| Time|StartPrice|MaxPrice|MinPrice|EndPrice|TradedVolume|NumberOfTrades|
+------------+--------+--------------------+------------+--------+----------+-------------------+-----+----------+--------+--------+--------+------------+--------------+
|AT0000A0E9W5|    SANT|S+T AG (Z.REG.MK....|Common stock|     EUR|   2504159|2018-03-26 00:00:00|07:00|     20.56|   20.56|   20.56|   20.56|        1115|             5|
|DE000A0WMPJ6|    AIXA|  AIXTRON SE NA O.N.|Common stock|     EUR|   2504428|2018-03-26 00:00:00|07:00|    17.035|   17.08|   16.92|   16.98|        2892|            11|
|DE000A0Z2XN6|     RIB|RIB SOFTWARE SE  ...|Common stock|     EUR|   2504436|2018-03-26 00:00:00|07:00|     24.02|   24.18|   23.94|   24.12|        5

## Writing the spark DF into a table in Iguazio DB

In [None]:
# specify the DB index key using the key option (note the key must be unique)
df.write.format("io.iguaz.v3io.spark.sql.kv").mode("append").option("key", "ISIN").save("v3io://bigdata/examples/stocks_tab")


## Reading a table via Spark DF

In [None]:
spark.read.format("io.iguaz.v3io.spark.sql.kv").load("v3io://bigdata/examples/stocks_tab").show()

## Using SparkSQL and converting to Pandas DataFrame

In [13]:
# Create a SqlContext from the SparkContext
sqlContext = pyspark.SQLContext(spark)

In [None]:
# Register the DataFrame as a table
df.registerTempTable("mytable")

# Peform a simple select from the table
results = sqlContext.sql("select * from mytable where NumberOfTrades > 80")

# Convert the results to a Pandas DataFrame for easy viewing
results.toPandas()

# Using SQL queries (using Presto)
## Reading the stock_tab table using SQL after being written by Spark DF


In [None]:
%sql select * from v3io.bigdata."/examples/stocks_tab" where tradedvolume > 20000

# Remove Data

In [None]:
!rm -rf /v3io/bigdata/examples/stocks*
