# Welcome to Iguazio Data science Platform

1. [Platform Overview](#Platform-Overview)
2. [Data science workflow](#Data-science-workflow)
  1. [Data Collection and exploration](#Data-Collection-and-exploration)
  2. [Building and training models](#Building-and-training-models)
  3. [Models deployment in Production](#Models-deployment-in-Production)
3. [Visulization](#Visulization)
4. [End to end demos](#End-to-end-demos)
5. [Useful links](#Useful-links)
6. [Support](#Support)
7. [Others](#Others)


# Platform Overview

Iguazio platform provides a fully integrated and secure Data Science PaaS with:

•	Data science workbench (Jupyter with integrated analytics engines & Python packages) <br>
•	Managed services over Kubernetes (e.g. Spark, Presto, Prometheus, Grafana etc..) <br>
•	Fast data layer supporting SQL, NoSQL, time series , files/objects and streaming <br>
•	Real-time serverless functions framework (aka Nuclio) <br>

Customers can ingest, enrich, analyze and serve data — all in one simple, fast, and secure platform. <br>
It simplify the data scientist work by providing a full data science environment running on a powerful cluster <br> allowing users to run distributes jobs and shared resources (specially GPU) in a very efficient way <br>
The platform accelerates the deployment of a variety of analytics services, eliminating data-pipeline complexities<br>
and reducing time to market for developing new application with machine learning / AI capabilities

[Detailed Platform overview](PlatformComponents.pdf) <br>
[Introduction video](https://www.youtube.com/watch?v=hR_Hv0_WcUw) <br>
[Serverless overview](https://github.com/nuclio/nuclio-jupyter/blob/master/README.md#installing)

<img src="IguazioDiagram.png">

# Data science workflow 

Iguazio provides all the building blocks for creating data science applications from research to production. <br> 

### Data Collection and exploration

[Collection and exploration](GettingStarted/GettingStarted.ipynb) 

Data collection - Various ways for collecting data into the system from various sources: <br>
* Streaming engine (e.g. kafka)<br>
* External database <br>
* File/object on S3 / Hadoop (e.g. CSV, Parquet)<br>
* Ingesting via RestAPI <br>

Data Exploration and Processing

Iguazio provides a wide range of those integrated tools. The most common ones are: <br>
* Spark: SQL, ML, R, Graph<br>
* SQL interactive queries (Presto)<br>
* Pandas dataframe or Dask for “distributed Pandas like”<br>
* Frames - Iguazio open source high speed library for data access providing unified interface for NOSQL tables, TIme series tables and Streaming data<br>
Among others, Frames helps on leveraging GPUs for further acceleration (beyond Dask). <br>
* Built in ML packages:  Scikit learn , Pyplot , numpy, Pytorch and Tensorflow. <br>
All the tools are integrated with Jupyter notebook allowing access to same data  through multiple tools and APIs. <br>
The Python environment has pre-deployed conda package. Users can install any packages using pip and conda. 


### Building and training models

Here is an example of training a model in Iguazio <br>
The example is part of an end to end demo (see demos below) called Network operation <br>
[Training models](demos/netops/training.ipynb)

### Models deployment in Production

Deploying a model in a serving layer is always a big headache. <br>
However, with iguazio platform users can easily deploy their model in few simple steps <br>
Here is a an overview of Nuclio and how to work and deploy your python code from Jupyter to a serverless function <br>
https://github.com/nuclio/nuclio-jupyter/blob/master/README.md#installing <br>
[Example of deploying the network operation model as a function](demos/netops/nuclio_infer.ipynb)


# Visulization

Users can use Jupyter in order to visualize the data. e.g. plot charts using matplotlib.<br>
A Grafana dashboard can visualize RT data over the NoSQL and time series. <br>
For information on how to create charts in Grafana using Iguazio :<br>
https://www.iguazio.com/docs/tutorials/latest-release/getting-started/trial-qs/grafana-dashboards/


# End to end demos

[FSI - Sentiment analysis for stocks](demos/stocks/read_stocks.ipynb) <br>
[Predictive analytics for network operation](demos/netops/generator.ipynb)

# Useful links

* [API reference](https://iguazio.com/docs/reference/latest-release/api-reference/)
* [Development eco system](https://www.iguazio.com/docs/intro/latest-release/ecosystem/)
* [10 Minutes to pandas](https://pandas.pydata.org/pandas-docs/stable/10min.html)
* [Jupyter lab tutorial](https://jupyterlab.readthedocs.io/en/stable/)

# Support
Our support team will be happy to help with any questions <br>
Feel free to reach out to support@iguazio.com or use the chatbox for direct communicaiton with our experts

# Others

Sample datasets http://iguazio-sample-data.s3.amazonaws.com/ <br>