# Using the V3IO Frames Library for High-Performance Data Access 

- [Overview](#frames-overview)
- [Initialization](#frames-init)
- [Working with NoSQL Tables (kv Backend)](#frames-kv)
- [Working with Time-Series Databases (tsdb Backend)](#frames-tsdb)
- [Working with Streams (stream Backend)](#frames-stream)
- [Cleanup](#frames-cleanup)

<a id="frames-overview"></a>
## Overview

[V3IO Frames](https://github.com/v3io/frames) (**"Frames"**) is a multi-model open-source data-access library, developed by Iguazio, which provides a unified high-performance DataFrame API for working with data in the data store of the Iguazio Data Science Platform (**"the platform"**).
Frames currently supports the NoSQL (key/value), stream, and time-series (TSDB) data models via its `kv`, `stream`, and `tsdb` backends.

To use Frames, you first need to import the **v3io_frames** library and create and initialize a client object &mdash; an instance of the`Client` class.<br>
The `Client` class features the following object methods for supporting basic data operations; the type of data is derived from the backend type (`tsdb` &mdash; TSDB table / `kv` &mdash; NoSQL table / `stream` &mdash; data stream):

- `create` &mdash; creates a new TSDB table or stream ("backend data").
- `delete` &mdash; deletes a table or stream.
- `read` &mdash; reads data from a table or stream into pandas DataFrames.
- `write` &mdash; writes data from pandas DataFrames to a table or stream.
- `execute` &mdash; executes a command on a table or stream.
  Each backend may support multiple commands.

For a detailed description of the Frames API, see the [Frames API reference](https://www.iguazio.com/docs/reference/latest-release/api-reference/frames/).<br>
For more help and usage details, use the internal API help &mdash; `<client object>.<command>?` in Jupyter Notebook or `print(<client object>.<command>.__doc__)`.<br>
For example, the following command returns information about the read operation for a client object named `client`:
```
client.read?
```

<a id="frames-init"></a>
## Initialization

To use V3IO Frames, first ensure that your platform tenant has a shared tenant-wide instance of the V3IO Frames service.
This can be done by a platform service administrator from the **Services** dashboard page.<br>
Then, import the required libraries and create a Frames client object (an instance of the `Client` class), as demonstrated in the following code, which creates a client object named `client`.

> **Note:**
> - The client constructor's `container` parameter is set to `"users"` for accessing data in the platform's "users" data container.
> - Because no authentication credentials are passed to the constructor, Frames will use the access key that's assigned to the `V3IO_ACCESS_KEY` environment variable.
>   The platform's Jupyter Notebook service defines this variable automatically and initializes it to a valid access key for the running user of the service.
>   You can pass different credentials by using the constructor's `token` parameter (platform access key) or `user` and `password` parameters (platform username and password).

In [1]:
import pandas as pd
import sys
sys.path.append('../')
import v3io_frames_local as v3f
import os

# Create a Frames client
client = v3f.Client("framesd:8081", container="users")

In [21]:
!pwd

/User/getting-started


<a id='frames-kv'></a>
## Working with NoSQL Tables (kv Backend)

This section demonstrates how to use the `kv` Frames backend to write and read NoSQL data in the platform.

- [Initialization](#frames-kv-init)
- [Load Data from Amazon S3](frames-kv-load-data-s3)
- [Write to a NoSQL Table](#frames-kv-write)
- [Read from the Table Using an SQL Query](#frames-kv-read-sql-query)
- [Read from the Table Using the Frames API](#frames-kv-read-frames-api)
  - [Read Using a Single DataFrame](#frames-kv-read-frames-api-single-df)
  - [Read Using a DataFrames Iterator (Streaming)](#frames-kv-read-frames-api-df-iterator)
- [Delete the NoSQL Table](#frames-kv-delete)

<a id="frames-kv-init"></a>
### Initialization

Start out by defining table-path variables that will be used in the tutorial's code examples.<br>
The table path (`table`) is relative to the configured parent data container; see [Write to a NoSQL Table](#frames-kv-write).

In [2]:
# Relative path to the NoSQL table within the parent platform data container
table = os.path.join(os.getenv("V3IO_USERNAME"), "examples/bank")

# Full path to the NoSQL table for SQL queries (platform Presto data-path syntax);
# use the same data container as used for the Frames client ("users")
sql_table_path = 'v3io.users."' + table + '"'

<a id="frames-kv-load-data-s3"></a>
### Load Data from Amazon S3

Read a file from an Amazon Simple Storage (S3) bucket into a Frames pandas DataFrame.

In [3]:
# Read an AWS S3 file into a DataFrame and show its data and metadata
df = pd.read_csv("https://s3.amazonaws.com/iguazio-sample-data/bank.csv", sep=";")
df.head()

Unnamed: 0,age,job,marital,education,default,balance,housing,loan,contact,day,month,duration,campaign,pdays,previous,poutcome,y
0,30,unemployed,married,primary,no,1787,no,no,cellular,19,oct,79,1,-1,0,unknown,no
1,33,services,married,secondary,no,4789,yes,yes,cellular,11,may,220,1,339,4,failure,no
2,35,management,single,tertiary,no,1350,yes,no,cellular,16,apr,185,1,330,1,failure,no
3,30,management,married,tertiary,no,1476,yes,yes,unknown,3,jun,199,4,-1,0,unknown,no
4,59,blue-collar,married,secondary,no,0,yes,no,unknown,5,may,226,1,-1,0,unknown,no


<a id="frames-kv-write"></a>
### Write to a NoSQL Table

Use the `write` method of the Frames client with the `kv` backend to write the data that was read in the previous step to a NoSQL table.<br>
The mandatory `table` parameter specifies the relative table path within the data container that was configured for the Frames client (see the [main initialization](#frames-init) step).
In the following example, the relative table path is set by using the `table` variable that was defined in the [kv backend initialization](#frames-kv-init) step.<br>
The `dfs` parameter can be set either to a single DataFrame (as done in the following example) or to multiple DataFrames &mdash; either as a DataFrames iterator or as a list of DataFrames.

In [4]:
client.write("kv", table=table, dfs=df)

<a id="frames-kv-read-sql-query"></a>
### Read from the Table Using an SQL Query

You can run SQL queries on your NoSQL table (using Presto) to offload data filtering, grouping, joins, etc. to a scale-out high-speed database engine.

> **Note:** To query a table in a platform data container, the table path in the `from` section of the SQL query should be of the format `v3io.<container name>."/path/to/table"`.
> See [Presto Data Paths](https://www.iguazio.com/docs/tutorials/latest-release/getting-started/fundamentals/#data-paths-presto) in the platform documentation.
> In the following example, the path is set by using the `sql_table_path` variable that was defined in the [kv backend initialization](#frames-kv-init) step.
> Unless you changed the code, this variable translates to `v3io.users."<running user>/examples/bank"`; for example, `v3io.users."iguazio/examples/bank"` for user "iguazio".

In [5]:
%sql select * from $sql_table_path where balance > 10000 limit 8

Done.


loan,education,previous,housing,poutcome,duration,marital,default,balance,month,contact,campaign,y,idx,job,day,age,pdays
no,secondary,0,yes,unknown,249,married,no,19317,aug,cellular,1,yes,3553,retired,4,68,-1
no,secondary,0,yes,unknown,115,single,no,13683,jun,unknown,3,no,3878,blue-collar,3,34,-1
no,primary,0,no,unknown,323,single,no,11262,aug,cellular,1,yes,368,technician,26,60,-1
no,tertiary,4,yes,failure,8,married,no,22546,may,cellular,6,no,3332,management,14,31,267
no,secondary,0,yes,unknown,352,married,no,16063,may,unknown,3,no,4369,technician,30,57,-1
no,primary,0,no,unknown,106,divorced,no,10924,may,cellular,2,no,339,self-employed,6,51,-1
no,tertiary,0,yes,unknown,71,married,no,27359,jun,unknown,2,no,1881,management,3,36,-1
no,secondary,0,no,unknown,125,single,no,11555,apr,cellular,2,no,561,student,8,28,-1


<a id="frames-kv-read-frames-api"></a>
### Read from the Table Using the Frames API

Use the `read` method of the Frames client with the `kv` backend to read data from your NoSQL table.<br>
The `read` method can return a DataFrame or a DataFrames iterator (a stream), as demonstrated in the following examples.

- [Read Using a Single DataFrame](#frames-kv-read-frames-api-single-df)
- [Read Using a DataFrames Iterator (Streaming)](#frames-kv-read-frames-api-df-iterator)

<a id="frames-kv-read-frames-api-single-df"></a>
#### Read Using a Single DataFrame

The following example uses a single command to read data from the NoSQL table into a DataFrame.

In [6]:
df = client.read(backend="kv", table=table, filter="balance > 20000")
df.head(8)

Unnamed: 0_level_0,age,balance,campaign,contact,day,default,duration,education,housing,job,loan,marital,month,pdays,poutcome,previous,y
idx,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
871,31,26965,2,cellular,21,no,654,primary,no,housemaid,no,single,apr,-1,unknown,0,yes
650,33,23663,2,cellular,16,no,199,tertiary,yes,housemaid,no,single,apr,146,failure,2,no
1483,43,27733,7,unknown,3,no,164,tertiary,yes,technician,no,single,jun,-1,unknown,0,no
3332,31,22546,6,cellular,14,no,8,tertiary,yes,management,no,married,may,267,failure,4,no
1881,36,27359,2,unknown,3,no,71,tertiary,yes,management,no,married,jun,-1,unknown,0,no
2989,42,42045,2,cellular,8,no,205,tertiary,no,entrepreneur,no,married,aug,-1,unknown,0,no
3830,57,27069,3,unknown,20,no,174,tertiary,no,technician,yes,married,jun,-1,unknown,0,no
3231,29,22171,1,cellular,18,no,44,secondary,yes,admin.,no,married,may,355,failure,3,no


<a id="frames-kv-read-frames-api-df-iterator"></a>
#### Read Using a DataFrames Iterator (Streaming)

The following example uses a DataFrames iterator to stream data from the NoSQL table into multiple DataFrames and allow concurrent data movement and processing.<br>
The example sets the `iterator` parameter to `True` to receive a DataFrames iterator (instead of the default single DataFrame), and then iterates the DataFrames in the returned iterator; you can also use `concat` instead of iterating the DataFrames.

> **Note:** Iterators work with all Frames backends and can be used as input to write functions that support this, such as the `write` method of the Frames client.

In [7]:
dfs = client.read(backend="kv", table=table, filter="balance > 20000",
                  iterator=True)
for df in dfs:
    print(df.head())

      age  balance  campaign   contact  day default  duration education  \
idx                                                                       
650    33    23663         2  cellular   16      no       199  tertiary   
871    31    26965         2  cellular   21      no       654   primary   
1881   36    27359         2   unknown    3      no        71  tertiary   
2989   42    42045         2  cellular    8      no       205  tertiary   
1483   43    27733         7   unknown    3      no       164  tertiary   

     housing           job loan  marital month  pdays poutcome  previous    y  
idx                                                                            
650      yes     housemaid   no   single   apr    146  failure         2   no  
871       no     housemaid   no   single   apr     -1  unknown         0  yes  
1881     yes    management   no  married   jun     -1  unknown         0   no  
2989      no  entrepreneur   no  married   aug     -1  unknown         0  

<a id="frames-kv-delete"></a>
### Delete the NoSQL Table

Use the `delete` method of the Frames client with the `kv` backend to delete the NoSQL table that was used in the previous steps.

In [8]:
# Delete the `table` NoSQL table
client.delete("kv", table)

<a id='frames-tsdb'></a>
## Working with Time-Series Databases (tsdb Backend)

This section demonstrates how to use the `tsdb` Frames backend to create a time-series database (TSDB) table in the platform, ingest data into the table, and read from the table (i.e., submit TSDB queries).

- [Initialization](#frames-tsdb-init)
- [Create a TSDB Table](#frames-tsdb-create)
- [Write to the TSDB Table](#frames-tsdb-write)
- [Read from the TSDB Table](#frames-tsdb-read)
  - [Conditional Read](#frames-tsdb-read-conditional)
- [Delete the TSDB Table](#frames-tsdb-delete)

<a id="frames-tsdb-init"></a>
### Initialization

Start out by defining a TSDB table-path variable that will be used in the tutorial's code examples.<br>
The table path (`tsdb_table`) is relative to the configured parent data container; see [Create a TSDB Table](#frames-tsdb-create).

In [9]:
# Relative path to the TSDB table within the parent platform data container
tsdb_table = os.path.join(os.getenv("V3IO_USERNAME"), "examples/tsdb_tab")

<a id="frames-tsdb-create"></a>
### Create a TSDB Table

Use the `create` method of the Frames client with the `tsdb` backend to create a new TSDB table.<br>
The mandatory `table` parameter specifies the relative table path within the data container that was configured for the Frames client (see the [main initialization](#frames-init) step).
In the following example, the relative table path is set by using the `tsdb_table` variable that was defined in the [tsdb backend initialization](#frames-tsdb-init) step.<br>
You must set the `rate` argument to the ingestion rate of the TSDB metric-samples, as `"[0-9]+/[smh]"` (where '`s`' = seconds, '`m`' = minutes, and '`h`' = hours); for example, `1/s` (one sample per minute).
It's recommended that you set the rate to the average expected ingestion rate, and that the ingestion rates for a given TSDB table don't vary significantly; when there's a big difference in the ingestion rates (for example, x10), use separate TSDB tables.
You can also set additional optional arguments, such as `aggregates` or `aggregation-granularity`.

In [20]:
# Create a new TSDB table; ingestion rate = one sample per hour ("1/h")
client.create(backend="tsdb", table="dina6", rate="1/s")

import time
import pandas as pd

end = time.time()
rng = pd.date_range(end=end, periods=1, freq='300s', tz='Israel')
df = pd.DataFrame([18446740778103092197], index=rng, columns=['cpu'])

client.write("tsdb", "dina6", dfs=df)

WriteError: error in _write: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.UNKNOWN
	details = "Exception iterating requests!"
	debug_error_string = "None"
>

<a id="frames-tsdb-write"></a>
### Write to the TSDB Table

Use the `write` method of the Frames client with the `tsdb` backend to ingest data from a pandas DataFrame into your TSDB table.<br>
The primary-key attribute of platform TSDB tables (i.e., the DataFrame index column) must hold the sample time of the data (displayed as `time` in read outputs).<br>
In addition, TSDB table items (rows) can optionally have sub-index columns (attributes) that are called labels.
You can add labels to TSDB table items in one of two ways; you can also combine these methods:

- Use the `labels` dictionary parameter of the `write` method to add labels to all the written metric-sample table items (DataFrame rows) &mdash; `{<label>: <value>[, <label>: <value>, ...]}`.<br>
  For example, `{"node": "11", "os": "linux"}`.
  Note that the label values must be provided as strings.
- Define DataFrame index columns for the labels.
  All DataFrame index columns except for the sample-time index column are automatically converted into labels for the respective table items.
  > **Note:** If you wish to use regular columns in your DataFrames as metric labels, convert these columns to index columns.
  > The following example converts the `symbol` and `exchange` columns to index columns that will be used as metric labels (in addition to the `time` index column):<br>
  > ```python
  > df.index.name="time"                              # Name the sample-time index column "time"
  > df.reset_index(level=0, inplace=True)             # Reset the DataFrame indexes
  > df = df.set_index(["time", "symbol", "exchange"]) # Define the time and label columns as index columns
  > ```

In [12]:
import numpy as np
from datetime import datetime, timedelta


# Genearte a DataFrame with TSDB metric samples and a "time" index column
def gen_df_w_tsdb_data(num_items=24, freq="1H", end=None, start=None,
                       start_delta=None, tz=None, normalize=False, zero=False,
                       attrs=["cpu", "mem", "disk"]):
    if (start is None and start_delta is not None and end is not None):
        start = end - timedelta(days=start_delta)
    if (zero):
        if (end is not None):
            end = end.replace(minute=0, second=0, microsecond=0)
        if (start is not None):
            start = start.replace(minute=0, second=0, microsecond=0)
    # If `start`, `end`, `num_items` (date_range() `periods`), and `freq`
    # are set, ignore `freq`
    if (freq is not None and start is not None and end is not None and
            num_items is not None):
        freq = None
    times = pd.date_range(periods=num_items, freq=freq, start=start, end=end,
                          tz=tz, normalize=normalize)
    data = np.random.rand(num_items, len(attrs)) * 100
    df = pd.DataFrame(data, index=times, columns=attrs)
    df.index.name = "time"
    return df


In [13]:
# Prepare DataFrames with randomly generated metric samples
end_t = datetime.now()
start_delta = 7  # start time = ent_t - 7 days
dfs = []
for i in range(4):
    # Generate a new DataFrame with TSDB metrics
    dfs.append(gen_df_w_tsdb_data(end=end_t, start_delta=7, zero=True))
    # Display DataFrame info & head (optional - for testing)
    print("\n** dfs[" + str(i) + "] **")
    display(dfs[i].info(), dfs[i].head())



** dfs[0] **
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 24 entries, 2020-01-03 13:00:00 to 2020-01-10 13:00:00
Data columns (total 3 columns):
cpu     24 non-null float64
mem     24 non-null float64
disk    24 non-null float64
dtypes: float64(3)
memory usage: 768.0 bytes


None

Unnamed: 0_level_0,cpu,mem,disk
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2020-01-03 13:00:00.000000000,0.656822,25.343254,8.574281
2020-01-03 20:18:15.652173913,81.027776,69.744752,51.079915
2020-01-04 03:36:31.304347826,69.877973,85.808604,37.40712
2020-01-04 10:54:46.956521739,49.965332,19.357673,54.455909
2020-01-04 18:13:02.608695652,20.998817,11.521681,44.940804



** dfs[1] **
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 24 entries, 2020-01-03 13:00:00 to 2020-01-10 13:00:00
Data columns (total 3 columns):
cpu     24 non-null float64
mem     24 non-null float64
disk    24 non-null float64
dtypes: float64(3)
memory usage: 768.0 bytes


None

Unnamed: 0_level_0,cpu,mem,disk
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2020-01-03 13:00:00.000000000,95.401473,56.811962,1.309784
2020-01-03 20:18:15.652173913,88.432183,94.093034,42.480825
2020-01-04 03:36:31.304347826,24.214176,97.490881,22.358616
2020-01-04 10:54:46.956521739,40.411803,55.000451,31.001184
2020-01-04 18:13:02.608695652,81.065654,62.942572,94.047812



** dfs[2] **
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 24 entries, 2020-01-03 13:00:00 to 2020-01-10 13:00:00
Data columns (total 3 columns):
cpu     24 non-null float64
mem     24 non-null float64
disk    24 non-null float64
dtypes: float64(3)
memory usage: 768.0 bytes


None

Unnamed: 0_level_0,cpu,mem,disk
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2020-01-03 13:00:00.000000000,36.729845,23.903441,70.830602
2020-01-03 20:18:15.652173913,89.644225,27.976424,79.411445
2020-01-04 03:36:31.304347826,54.789899,14.526948,25.347536
2020-01-04 10:54:46.956521739,25.784494,65.162713,54.963946
2020-01-04 18:13:02.608695652,94.760468,7.512315,42.226179



** dfs[3] **
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 24 entries, 2020-01-03 13:00:00 to 2020-01-10 13:00:00
Data columns (total 3 columns):
cpu     24 non-null float64
mem     24 non-null float64
disk    24 non-null float64
dtypes: float64(3)
memory usage: 768.0 bytes


None

Unnamed: 0_level_0,cpu,mem,disk
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2020-01-03 13:00:00.000000000,20.465784,55.85214,48.072533
2020-01-03 20:18:15.652173913,91.673934,65.263198,29.900068
2020-01-04 03:36:31.304347826,11.385613,28.062709,92.981227
2020-01-04 10:54:46.956521739,69.387996,32.479925,1.049438
2020-01-04 18:13:02.608695652,5.558903,9.629548,69.840307


In [14]:
# Write to a TSDB table

# Prepare metric labels to write
labels = [
    {"node": "11", "os": "linux"},
    {"node": "2", "os": "windows"},
    {"node": "11", "os": "windows"},
    {"node": "2", "os": "linux"}
]

# Write the contents of the prepared DataFrames to a TSDB table. Use multiple
# write commands with the `labels` parameter to set different label values.
num_dfs = len(dfs)
for i in range(num_dfs):
    client.write("tsdb", table=tsdb_table, dfs=dfs[i], labels=labels[i])


<a id="frames-tsdb-read"></a>
### Read from the TSDB Table

- [Overview and Basic Examples](#frames-tsdb-read-basic)
- [Conditional Read](#frames-tsdb-read-conditional)

<a id="frames-tsdb-read-basic"></a>
#### Overview and Basic Examples

Use the `read` method of the Frames client with the `tsdb` backend to read data from your TSDB table (i.e., query the database).<br>
You can perform one of two types of queries (but you cannot mix the two); note that you also cannot mix raw sample-data queries and aggregation queries:

- **A non-SQL query** &mdash; set the `table` parameter to the path to the TSDB table, and optionally set additional method parameters to configure the query.
  `columns` defines the query metrics (default = all); `aggregators` defines aggregation functions ("aggregators") to execute for all the configured metrics; `filter` restricts the query by using a platform [filter expression](https://www.iguazio.com/docs/reference/latest-release/expressions/condition-expression/#filter-expression); and `group by` allows grouping the results by specific metric labels.
- **An SQL query** \[Tech Preview\] &mdash; set the `query` parameter to an SQL query string of the following format:
  ```
  select <metrics | aggregators> from '<table path>' [where <filter expression>] [group by <labels>]
  ```
  > **Note:**
  > - In SQL queries, the path to the TSDB table is set in the `FROM` clause of the `query` string and not in the `read` method's `table` parameter.
  > - The `where` filter expression is similar to that passed to the `filter` parameter for a non-SQL query, except it's in SQL format, so the expression isn't embedded within quotation marks and comparisons are done by using the '`=`' operator instead of the '`==`' operator.
  > - The `select` clause can optionally include a comma-separated list of either over-time aggregators (such as `avg` or `sum`) or cross-series aggregators (such as `avg_all` or `sum_all`), but you cannot mix these aggregation types.
  >   The aggregation functions receive a metric-name parameter (for example, `avg(cpu)`, `avg_all(cpu)`, or `avg(*)` for all metrics).
  >   Cross-series aggregations functions can also optionally receive an interpolation function &mdash; `next` (default) | `prev` | `linear` | `none` &mdash; in which case the metric name is passed as a parameter of the interpolation function (and not as a direct parameter of the aggregation function); the interpolation function can also optionally receive an interpolation-tolerance string of the format `"[0-9]+[mhd]"` (for example, `avg_all(prev(cpu,'1h'))`).

For both types of queries, you can also optionally set additional parameters.
`start` and `end` define the query's time range &mdash; the metric-sample timestamps to which to apply the query (the default end time is `"now"` and the default start time is 1 hour before the end time); `step` defines the interval for aggregation or raw-data downsampling (default = the query's time range); and`aggregationWindow` defines the aggregation time frame for over-time aggregation (default = `step`).<br>
You can set the optional `multi_index` parameter to `True` to return labels as index columns, as demonstrated in the following examples.
By default, only the metric sample-time primary-key attribute is returned as an index column.<br>
See the [Frames API reference](https://www.iguazio.com/docs/reference/latest-release/api-reference/frames/tsdb/read/) for more information about the `read` parameters that are supported for the `tsdb` backend.

In [15]:
# Read all metrics from the TSDB table (start="0"; default `end` time = "now")
# into a single DataFrame (default `Iterator`=False) and display the first 10
# items; show metric labels as index columns (multi_index=True)
df = client.read(backend="tsdb", table=tsdb_table, start="0", multi_index=True)
display(df.head(10))

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,cpu,disk,mem
time,os,node,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2020-01-03 13:00:00.000,windows,11,36.729845,70.830602,23.903441
2020-01-03 20:18:15.652,windows,11,89.644225,79.411445,27.976424
2020-01-04 03:36:31.304,windows,11,54.789899,25.347536,14.526948
2020-01-04 10:54:46.956,windows,11,25.784494,54.963946,65.162713
2020-01-04 18:13:02.608,windows,11,94.760468,42.226179,7.512315
2020-01-05 01:31:18.260,windows,11,84.67462,72.365714,3.607919
2020-01-05 08:49:33.913,windows,11,35.783269,88.338686,24.270184
2020-01-05 16:07:49.565,windows,11,84.779386,34.15671,55.820135
2020-01-05 23:26:05.217,windows,11,41.311304,2.457344,13.02915
2020-01-06 06:44:20.869,windows,11,8.252393,80.533331,86.658962


In [16]:
# Read the full table contents, as in the previous example but use an SQL query
query_str = f"select * from '{tsdb_table}'"
df = client.read(backend="tsdb", query=query_str, start="0", multi_index=True)
display(df.head(10))

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,mem,cpu,disk
time,node,os,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2020-01-03 13:00:00.000,11,linux,25.343254,0.656822,8.574281
2020-01-03 20:18:15.652,11,linux,69.744752,81.027776,51.079915
2020-01-04 03:36:31.304,11,linux,85.808604,69.877973,37.40712
2020-01-04 10:54:46.956,11,linux,19.357673,49.965332,54.455909
2020-01-04 18:13:02.608,11,linux,11.521681,20.998817,44.940804
2020-01-05 01:31:18.260,11,linux,39.663167,39.889178,93.475519
2020-01-05 08:49:33.913,11,linux,1.108835,29.449807,28.209763
2020-01-05 16:07:49.565,11,linux,95.624381,45.305452,67.069894
2020-01-05 23:26:05.217,11,linux,57.865623,8.905794,7.331962
2020-01-06 06:44:20.869,11,linux,18.45993,73.773484,58.847895


In [17]:
# Read over-time aggregates with a 1-hour aggregation step for all metric
# samples created in the last 2 days; use an SQL query (see `query`)
query_str = f"select avg(*), max(*), min(*) from '{tsdb_table}'"
df = client.read(backend="tsdb", query=query_str, step="1h", start="now-1d",
                 end="now", multi_index=True)
display(df)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,avg(mem),max(mem),min(mem),avg(disk),max(disk),min(disk),avg(cpu),max(cpu),min(cpu)
time,node,os,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2020-01-09 14:19:50,11,linux,37.799002,37.799002,37.799002,68.914148,68.914148,68.914148,3.529199,3.529199,3.529199
2020-01-09 22:19:50,11,linux,35.765542,35.765542,35.765542,97.210068,97.210068,97.210068,41.520614,41.520614,41.520614
2020-01-10 05:19:50,11,linux,1.441835,1.441835,1.441835,97.254813,97.254813,97.254813,5.513239,5.513239,5.513239
2020-01-10 12:19:50,11,linux,66.71358,66.71358,66.71358,27.026969,27.026969,27.026969,44.737859,44.737859,44.737859
2020-01-09 14:19:50,11,windows,10.88285,10.88285,10.88285,13.153557,13.153557,13.153557,11.34152,11.34152,11.34152
2020-01-09 22:19:50,11,windows,35.392134,35.392134,35.392134,17.554038,17.554038,17.554038,58.349334,58.349334,58.349334
2020-01-10 05:19:50,11,windows,39.593641,39.593641,39.593641,65.466283,65.466283,65.466283,53.406765,53.406765,53.406765
2020-01-10 12:19:50,11,windows,20.256509,20.256509,20.256509,67.936604,67.936604,67.936604,50.263656,50.263656,50.263656
2020-01-09 14:19:50,2,linux,98.972007,98.972007,98.972007,58.85104,58.85104,58.85104,88.422522,88.422522,88.422522
2020-01-09 22:19:50,2,linux,38.424995,38.424995,38.424995,28.257093,28.257093,28.257093,88.144606,88.144606,88.144606


In [18]:
# Perform a similar query as in the previous example but use a non-SQL query
# and group the results by the `os` label
df = client.read(backend="tsdb", table=tsdb_table, aggregators="avg, max, min",
                 step="1h", group_by="os", start="now-1d", end="now",
                 multi_index=True)
display(df)

Unnamed: 0_level_0,Unnamed: 1_level_0,avg(cpu),avg(disk),avg(mem),max(cpu),max(disk),max(mem),min(cpu),min(disk),min(mem)
time,os,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
2020-01-09 14:19:51,linux,45.97586,63.882594,68.385504,88.422522,68.914148,98.972007,3.529199,58.85104,37.799002
2020-01-09 22:19:51,linux,64.83261,62.733581,37.095269,88.144606,97.210068,38.424995,41.520614,28.257093,35.765542
2020-01-10 05:19:51,linux,31.296257,78.004052,21.995841,57.079275,97.254813,42.549847,5.513239,58.753292,1.441835
2020-01-10 12:19:51,linux,24.571874,43.601904,79.874932,44.737859,60.176839,93.036285,4.40589,27.026969,66.71358
2020-01-09 14:19:51,windows,41.065586,14.088049,30.864138,70.789653,15.02254,50.845427,11.34152,13.153557,10.88285
2020-01-09 22:19:51,windows,46.537274,9.157164,33.53782,58.349334,17.554038,35.392134,34.725215,0.760289,31.683506
2020-01-10 05:19:51,windows,68.614155,60.420644,20.810966,83.821545,65.466283,39.593641,53.406765,55.375004,2.028291
2020-01-10 12:19:51,windows,58.83941,65.098036,16.115655,67.415165,67.936604,20.256509,50.263656,62.259468,11.974801


<a id="frames-tsdb-read-conditional"></a>
#### Conditional Read

The following examples demonstrate how to use a query filter to conditionally read only a subset of the data from a TSDB table.<br>

- In non-SQL queries, this is done by setting the value of the `filter` parameter to a [platform filter expression](https://www.iguazio.com/docs/reference/latest-release/expressions/condition-expression/#filter-expression).
- In SQL queries, this is done by setting the `query` parameter to a query string that includes a `FROM` clause with a platform filter expression expressed as an SQL expression.
  Note that the comparison operator for such queries is `=`, as opposed to `==` in non-SQL queries.

In [19]:
# Read over-time aggregates with a 1-day aggregation step for all metric
# samples in the table with the `os` label "linux" and the `node` label 11.
df = client.read(backend="tsdb", table=tsdb_table, aggregators="count,sum",
                 step="1d", start="0", filter="os=='linux' and node=='11'",
                 multi_index=True)
display(df)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count(cpu),count(disk),count(mem),sum(cpu),sum(disk),sum(mem)
time,node,os,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2020-01-03,11,linux,2.0,2.0,2.0,81.684598,59.654196,95.088007
2020-01-04,11,linux,3.0,3.0,3.0,140.842122,136.803832,116.687958
2020-01-05,11,linux,4.0,4.0,4.0,123.550231,196.087137,194.262006
2020-01-06,11,linux,3.0,3.0,3.0,229.279558,127.376353,190.39619
2020-01-07,11,linux,3.0,3.0,3.0,217.58794,149.599111,129.668818
2020-01-08,11,linux,3.0,3.0,3.0,213.713018,200.724435,203.506269
2020-01-09,11,linux,4.0,4.0,4.0,181.222919,295.633202,211.495972
2020-01-10,11,linux,2.0,2.0,2.0,50.251098,124.281782,68.155415


In [21]:
# Read over-time aggregates with an half-hour step for mem` metric samples
# created yesterday with the `os` label "windows" and the `node` label 2, and
# group the results by the `node` label; use an SQL query
query_str = f"select count(mem), sum(mem) from '{tsdb_table}' " + \
    "where os='windows' and node='2' group by node"
df = client.read(backend="tsdb", query=query_str, step="15m",
                 start="now-1d", multi_index=True)
display(df)

Unnamed: 0_level_0,Unnamed: 1_level_0,count(mem),sum(mem)
time,node,Unnamed: 2_level_1,Unnamed: 3_level_1
2020-01-09 22:09:52,2,1.0,31.683506
2020-01-10 05:39:52,2,1.0,2.028291
2020-01-10 12:54:52,2,1.0,11.974801


<a id="frames-tsdb-delete"></a>
### Delete the TSDB Table

Use the `delete` method of the Frames client with the `tsdb` backend to delete the TSDB table that was used in the previous steps.

In [22]:
client.delete("tsdb", tsdb_table)

<a id='frames-stream'></a>
## Working with Streams (stream Backend)

The platform supports streams that have an AWS Kinesis-like API. For more information, see the [platform documentation](https://www.iguazio.com/docs/concepts/latest-release/streams/).
<br>
This section demonstrates how to use the `streams` Frames backend to work with streams in the platform.

- [Initialization](#frames-stream-init)
- [Create a Stream](#frames-stream-create)
- [Write to the Stream](#frames-stream-write)
  - [Use the Write Method to Perform a Batch Update](#frames-stream-write-batch-update)
  - [Use the Execute Method's Put Command to Update a Single Record](#frames-stream-execute-put)
- [Read from the Stream](#frames-stream-read)
- [Delete the Stream](#frames-tsdb-delete)

<a id="frames-stream-init"></a>
### Initialization

Start out by defining a stream-path variable that will be used in the tutorial's code examples.<br>
The stream path (`strm`) is relative to the configured parent data container; see [Create a Stream](#frames-stream-create).

In [23]:
# Relative path to the stream within the parent platform data container
strm = os.path.join(os.getenv("V3IO_USERNAME"), "examples/somestream")

<a id="frames-stream-create"></a>
### Create a Stream

Use the `create` method of the Frames client with the `stream` backend to create a new data stream.<br>
The mandatory `table` parameter specifies the relative stream path within the data container that was configured for the Frames client (see the [main initialization](#frames-init) step).
In the following example, the relative stream path is set by using the `strm` variable that was defined in the [stream backend initialization](#frames-stream-init) step.<br>
You can optionally provide additional arguments.
For example, you can set the `shards` argument to the number of shards in the stream, or you can set the `retention_hours` argument to the stream's retention period in hours.

In [25]:
# Create a new stream
client.create(backend="stream", table=strm, retention_hours=48, shards=1)

<a id="frames-stream-write"></a>
### Write to the Stream

You can use either of the following methods to ingest data into your stream:

- [Use the Write Method to Perform a Batch Update](#frames-stream-write-batch-update)
- [Use the Execute Method's Put Command to Update a Single Record](#frames-stream-execute-put)

<a id="frames-stream-write-batch-update"></a>
#### Use the Write Method to Perform a Batch Update

Use the `write` method of the Frames client with the `stream` backend to ingest multiple records into your stream (batch update), as demonstrated in the following example.<br>
The `dfs` parameter can be set either to a single DataFrame (as done in the following example) or to multiple DataFrames &mdash; either as a DataFrames iterator or as a list of DataFrames.

In [26]:
# Prepare the ingestion data
import numpy as np
from datetime import datetime, timedelta

end = datetime.now().replace(minute=0, second=0, microsecond=0)
rng = pd.date_range(end=end, periods=60, freq="300s", tz="Israel")
df = pd.DataFrame(np.random.randn(len(rng), 3), index=rng, columns=["cpu", "mem", "disk"])

# Ingest data into the stream
client.write("stream", table=strm, dfs=df)

<a id="frames-stream-execute-put"></a>
#### Use the Execute Method's Put Command to Update a Single Record

Use the `put` command of the `execute` method of the Frames client with the `stream` backend to add a single record to a stream.<br>
Use the `args` parameter of the `put` command to provide the necessary information:
set the mandatory `data` argument to the ingested record data.
You can optionally set the `clientinfo` argument to additional metadata and the `partition` argument to a partition key; records with the same partition key are assigned to the same shard.

In [27]:
client.execute("stream", strm, "put", args={'data': "abcd", "clientinfo": "123"})

<a id="frames-stream-read"></a>
### Read from the Stream

Use the `read` method of the Frames client with the `stream` backend to read data from your stream.<br>
The mandatory `seek` parameter specifies the seek method, which determines the location within the target stream shard from which to read; some methods require setting additional parameters:

- `"earliest"` &mdash; start from the earliest point in the shard; (no additional parameters).
- `"latest"` &mdash; start from the latest location in the shard (i.e., consume only new records).
- `"time"` &mdash; start from a specific point in time, as specified in the `start` parameter (for example, `start="now-1d"`).
- `"sequence"` &mdash; start from a specific record sequence number, as specified in the `sequence` parameter (for example, `sequence=45`).

The `read` method can return a single DataFrame (default) or a DataFrames iterator (a stream) if the `iterator` parameter is set to `True`, as demonstrated in the following example.

In [28]:
# Read from the from the earliest available location (seek="earliest") in the first stream shard (shard_id=0);
# return the result as a DataFrames iterator (iterator=True) and iterate and print the returned data
dfs = client.read("stream", strm, seek="earliest", shard_id="0", iterator=True)
for df in dfs:
    print(df.head(4))

                 cpu      disk               index-0       mem raw_data  \
seq_number                                                                
1          -0.280861  0.005962  2020-01-10T10:05:00Z  0.085445            
2          -1.538130  1.065969  2020-01-10T10:10:00Z  0.780875            
3           0.298100 -0.365898  2020-01-10T10:15:00Z -1.902204            
4           0.177452 -0.287416  2020-01-10T10:20:00Z -1.220685            

                             stream_time  
seq_number                                
1          2020-01-10 17:11:05.711482769  
2          2020-01-10 17:11:05.711482769  
3          2020-01-10 17:11:05.711482769  
4          2020-01-10 17:11:05.711482769  


<a id="frames-tsdb-stream"></a>
### Delete the Stream

Use the `delete` method of the Frames client with the `stream` backend to delete the TSDB table that was used in the previous steps.

In [29]:
client.delete("stream", strm)

<a id="frames-cleanup"></a>
## Cleanup

You can optionally delete any of the directories or files that you created.
See the instructions in the [Creating and Deleting Container Directories](https://www.iguazio.com/docs/tutorials/latest-release/getting-started/containers/#create-delete-container-dirs) tutorial.
For example, the following code uses a local file-system command to delete the entire **&lt;running user&gt;/examples/** directory in the "users" container.
Edit the path, as needed, then remove the comment mark (`#`) and run the code.

In [30]:
#!rm -rf /User/examples/