# Using the V3IO Frames Library for High-Performance Data Access 

- [Overview](#frames-overview)
- [Initialization](#frames-init)
- [Working with NoSQL Tables (kv Backend)](#frames-kv)
- [Working with Time-Series Databases (tsdb Backend)](#frames-tsdb)
- [Working with Streams (stream Backend)](#frames-stream)
- [Cleanup](#frames-cleanup)

<a id="frames-overview"></a>
## Overview

[V3IO Frames](https://github.com/v3io/frames) (**"Frames"**) is a multi-model open-source data-access library, developed by Iguazio, which provides a unified high-performance DataFrame API for working with data in the data store of the Iguazio Data Science Platform (**"the platform"**).
Frames currently supports the NoSQL (key/value), stream, and time-series (TSDB) data models via its `kv`, `stream`, and `tsdb` backends.

To use Frames, you first need to import the **v3io_frames** library and create and initialize a client object &mdash; an instance of the`Client` class.<br>
The `Client` class features the following object methods for supporting basic data operations; the type of data is derived from the backend type (`tsdb` &mdash; TSDB table / `kv` &mdash; NoSQL table / `stream` &mdash; data stream):

- `create` &mdash; creates a new TSDB table or stream ("backend data").
- `delete` &mdash; deletes a table or stream.
- `read` &mdash; reads data from a table or stream into pandas DataFrames.
- `write` &mdash; writes data from pandas DataFrames to a table or stream.
- `execute` &mdash; executes a command on a table or stream.
  Each backend may support multiple commands.

For a detailed description of the Frames API, see the [Frames API reference](https://www.iguazio.com/docs/reference/latest-release/api-reference/frames/).<br>
For more help and usage details, use the internal API help &mdash; `<client object>.<command>?` in Jupyter Notebook or `print(<client object>.<command>.__doc__)`.<br>
For example, the following command returns information about the read operation for a client object named `client`:
```
client.read?
```

<a id="frames-init"></a>
## Initialization

To use V3IO Frames, first ensure that your platform tenant has a shared tenant-wide instance of the V3IO Frames service.
This can be done by a platform service administrator from the **Services** dashboard page.<br>
Then, import the required libraries and create a Frames client object (an instance of the `Client` class), as demonstrated in the following code, which creates a client object named `client`.

> **Note:**
> - The client constructor's `container` parameter is set to `"users"` for accessing data in the platform's "users" data container.
> - Because no authentication credentials are passed to the constructor, Frames will use the access key that's assigned to the `V3IO_ACCESS_KEY` environment variable.
>   The platform's Jupyter Notebook service defines this variable automatically and initializes it to a valid access key for the running user of the service.
>   You can pass different credentials by using the constructor's `token` parameter (platform access key) or `user` and `password` parameters (platform username and password).

In [1]:
import pandas as pd
import v3io_frames as v3f
import os

# Create a Frames client
client = v3f.Client("framesd:8081", container="users")

<a id='frames-kv'></a>
## Working with NoSQL Tables (kv Backend)

This section demonstrates how to use the `kv` Frames backend to write and read NoSQL data in the platform.

- [Initialization](#frames-kv-init)
- [Write to a NoSQL Table](#frames-kv-write)
- [Read from the Table Using an SQL Query](#frames-kv-read-sql-query)
- [Read from the Table Using the Frames API](#frames-kv-read-frames-api)
  - [Read Using a Single DataFrame](#frames-kv-read-frames-api-single-df)
  - [Read Using a DataFrames Iterator (Streaming)](#frames-kv-read-frames-api-df-iterator)
- [Delete the NoSQL Table](#frames-kv-delete)

<a id="frames-kv-init"></a>
### Initialization

Start out by defining table-path variables that will be used in the tutorial's code examples.<br>
The table path (`table`) is relative to the configured parent data container; see [Write to a NoSQL Table](#frames-kv-write).

In [2]:
# Relative path to the NoSQL table within the parent platform data container
table = os.path.join(os.getenv("V3IO_USERNAME"), "examples", "bank")

# Full path to the NoSQL table for SQL queries (platform Presto data-path syntax);
# use the same data container as used for the Frames client ("users")
sql_table_path = 'v3io.users."' + table + '"'

<a id="frames-kv-write"></a>
### Write to a NoSQL Table

Read a file from an Amazon Simple Storage (S3) bucket into a Frames pandas DataFrame, and use the `write` method of the Frames client with the `kv` backend to write the data to a NoSQL table.<br>
The mandatory `table` parameter specifies the relative table path within the data container that was configured for the Frames client (see the [main initialization](#frames-init) step).
In the following example, the relative table path is set by using the `table` variable that was defined in the [kv backend initialization](#frames-kv-init) step.<br>
The `dfs` parameter can be set either to a single DataFrame (as done in the following example) or to multiple DataFrames &mdash; either as a DataFrames iterator or as a list of DataFrames.

In [3]:
# Prepare the ingestion data by reading an AWS S3 file into a DataFrame
df = pd.read_csv("https://s3.amazonaws.com/iguazio-sample-data/bank.csv", sep=";")
# Display DataFrame info & head (optional - for testing)
# display(df.info(), df.head())

In [4]:
# Write data from a DataFrame to a NoSQL table
client.write("kv", table=table, dfs=df)

<a id="frames-kv-read-sql-query"></a>
### Read from the Table Using an SQL Query

You can run SQL queries on your NoSQL table (using Presto) to offload data filtering, grouping, joins, etc. to a scale-out high-speed database engine.

> **Note:** To query a table in a platform data container, the table path in the `from` section of the SQL query should be of the format `v3io.<container name>."/path/to/table"`.
> See [Presto Data Paths](https://www.iguazio.com/docs/tutorials/latest-release/getting-started/fundamentals/#data-paths-presto) in the platform documentation.
> In the following example, the path is set by using the `sql_table_path` variable that was defined in the [kv backend initialization](#frames-kv-init) step.
> Unless you changed the code, this variable translates to `v3io.users."<running user>/examples/bank"`; for example, `v3io.users."iguazio/examples/bank"` for user "iguazio".

In [5]:
%sql select * from $sql_table_path where balance > 10000 limit 8

Done.


loan,education,previous,housing,poutcome,duration,marital,default,balance,month,contact,campaign,y,idx,job,day,age,pdays
no,tertiary,0,no,unknown,420,married,no,15520,nov,cellular,1,no,1778,management,18,56,-1
no,secondary,0,no,unknown,29,married,no,12186,jun,unknown,3,no,272,management,20,46,-1
no,secondary,0,no,unknown,272,single,no,10177,may,cellular,4,no,1211,admin.,5,66,-1
no,tertiary,1,no,failure,172,married,no,15834,apr,cellular,3,no,1805,retired,5,70,186
no,tertiary,8,no,other,138,married,no,13669,oct,cellular,1,no,822,self-employed,15,40,136
yes,unknown,0,no,unknown,166,married,no,21244,aug,cellular,2,no,1821,housemaid,4,51,-1
no,secondary,0,no,unknown,125,single,no,11555,apr,cellular,2,no,561,student,8,28,-1
no,tertiary,0,yes,unknown,397,married,no,14220,sep,cellular,1,yes,2962,retired,9,71,-1


<a id="frames-kv-read-frames-api"></a>
### Read from the Table Using the Frames API

Use the `read` method of the Frames client with the `kv` backend to read data from your NoSQL table.<br>
The `read` method can return a DataFrame or a DataFrames iterator (a stream), as demonstrated in the following examples.

- [Read Using a Single DataFrame](#frames-kv-read-frames-api-single-df)
- [Read Using a DataFrames Iterator (Streaming)](#frames-kv-read-frames-api-df-iterator)

<a id="frames-kv-read-frames-api-single-df"></a>
#### Read Using a Single DataFrame

The following example uses a single command to read data from the NoSQL table into a DataFrame.

In [6]:
df = client.read(backend="kv", table=table, filter="balance > 20000")
df.head(8)

Unnamed: 0_level_0,age,balance,campaign,contact,day,default,duration,education,housing,job,loan,marital,month,pdays,poutcome,previous,y
idx,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
1821,51,21244,2,cellular,4,no,166,unknown,no,housemaid,yes,married,aug,-1,unknown,0,no
2624,53,22370,1,unknown,15,no,106,tertiary,yes,entrepreneur,no,married,may,-1,unknown,0,no
4014,41,21515,1,unknown,5,no,87,secondary,yes,admin.,no,married,jun,-1,unknown,0,no
871,31,26965,2,cellular,21,no,654,primary,no,housemaid,no,single,apr,-1,unknown,0,yes
1483,43,27733,7,unknown,3,no,164,tertiary,yes,technician,no,single,jun,-1,unknown,0,no
2776,37,22856,1,cellular,2,no,154,primary,no,management,no,married,jul,388,failure,1,no
650,33,23663,2,cellular,16,no,199,tertiary,yes,housemaid,no,single,apr,146,failure,2,no
3830,57,27069,3,unknown,20,no,174,tertiary,no,technician,yes,married,jun,-1,unknown,0,no


<a id="frames-kv-read-frames-api-df-iterator"></a>
#### Read Using a DataFrames Iterator (Streaming)

The following example uses a DataFrames iterator to stream data from the NoSQL table into multiple DataFrames and allow concurrent data movement and processing.<br>
The example sets the `iterator` parameter to `True` to receive a DataFrames iterator (instead of the default single DataFrame), and then iterates the DataFrames in the returned iterator; you can also use `concat` instead of iterating the DataFrames.

> **Note:** Iterators work with all Frames backends and can be used as input to write functions that support this, such as the `write` method of the Frames client.

In [7]:
dfs = client.read(backend="kv", table=table, filter="balance > 20000",
                  iterator=True)
for df in dfs:
    display(df.head())

Unnamed: 0_level_0,age,balance,campaign,contact,day,default,duration,education,housing,job,loan,marital,month,pdays,poutcome,previous,y
idx,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
1821,51,21244,2,cellular,4,no,166,unknown,no,housemaid,yes,married,aug,-1,unknown,0,no
871,31,26965,2,cellular,21,no,654,primary,no,housemaid,no,single,apr,-1,unknown,0,yes
1483,43,27733,7,unknown,3,no,164,tertiary,yes,technician,no,single,jun,-1,unknown,0,no
2624,53,22370,1,unknown,15,no,106,tertiary,yes,entrepreneur,no,married,may,-1,unknown,0,no
4014,41,21515,1,unknown,5,no,87,secondary,yes,admin.,no,married,jun,-1,unknown,0,no


<a id="frames-kv-delete"></a>
### Delete the NoSQL Table

Use the `delete` method of the Frames client with the `kv` backend to delete the NoSQL table that was used in the previous steps.

In [8]:
# Delete the `table` NoSQL table
client.delete("kv", table)

<a id='frames-tsdb'></a>
## Working with Time-Series Databases (tsdb Backend)

This section demonstrates how to use the `tsdb` Frames backend to create a time-series database (TSDB) table in the platform, ingest data into the table, and read from the table (i.e., submit TSDB queries).

- [Initialization](#frames-tsdb-init)
- [Create a TSDB Table](#frames-tsdb-create)
- [Write to the TSDB Table](#frames-tsdb-write)
- [Read from the TSDB Table](#frames-tsdb-read)
  - [Conditional Read](#frames-tsdb-read-conditional)
- [Delete the TSDB Table](#frames-tsdb-delete)

<a id="frames-tsdb-init"></a>
### Initialization

Start out by defining a TSDB table-path variable that will be used in the tutorial's code examples.<br>
The table path (`tsdb_table`) is relative to the configured parent data container; see [Create a TSDB Table](#frames-tsdb-create).

In [9]:
# Relative path to the TSDB table within the parent platform data container
tsdb_table = os.path.join(os.getenv("V3IO_USERNAME"), "examples", "tsdb_tab")

<a id="frames-tsdb-create"></a>
### Create a TSDB Table

Use the `create` method of the Frames client with the `tsdb` backend to create a new TSDB table.<br>
The mandatory `table` parameter specifies the relative table path within the data container that was configured for the Frames client (see the [main initialization](#frames-init) step).
In the following example, the relative table path is set by using the `tsdb_table` variable that was defined in the [tsdb backend initialization](#frames-tsdb-init) step.<br>
You must set the `rate` argument to the ingestion rate of the TSDB metric-samples, as `"[0-9]+/[smh]"` (where '`s`' = seconds, '`m`' = minutes, and '`h`' = hours); for example, `1/s` (one sample per minute).
It's recommended that you set the rate to the average expected ingestion rate, and that the ingestion rates for a given TSDB table don't vary significantly; when there's a big difference in the ingestion rates (for example, x10), use separate TSDB tables.
You can also set additional optional arguments, such as `aggregates` or `aggregation_granularity`.

In [10]:
# Create a new TSDB table; ingestion rate = one sample per hour ("1/h")
client.create(backend="tsdb", table=tsdb_table, rate="1/h")

<a id="frames-tsdb-write"></a>
### Write to the TSDB Table

Use the `write` method of the Frames client with the `tsdb` backend to ingest data from a pandas DataFrame into your TSDB table.<br>
The primary-key attribute of platform TSDB tables (i.e., the DataFrame index column) must hold the sample time of the data (displayed as `time` in read outputs).<br>
In addition, TSDB table items (rows) can optionally have sub-index columns (attributes) that are called labels.
You can add labels to TSDB table items in one of two ways; you can also combine these methods:

- Use the `labels` dictionary parameter of the `write` method to add labels to all the written metric-sample table items (DataFrame rows) &mdash; `{<label>: <value>[, <label>: <value>, ...]}`.<br>
  For example, `{"node": "11", "os": "linux"}`.
  Note that the label values must be provided as strings.
- Define DataFrame index columns for the labels.
  All DataFrame index columns except for the sample-time index column are automatically converted into labels for the respective table items.
  > **Note:** If you wish to use regular columns in your DataFrames as metric labels, convert these columns to index columns.
  > The following example converts the `symbol` and `exchange` columns to index columns that will be used as metric labels (in addition to the `time` index column):<br>
  > ```python
  > df.index.name="time"                              # Name the sample-time index column "time"
  > df.reset_index(level=0, inplace=True)             # Reset the DataFrame indexes
  > df = df.set_index(["time", "symbol", "exchange"]) # Define the time and label columns as index columns
  > ```

In [11]:
import numpy as np
from datetime import datetime, timedelta


# Genearte a DataFrame with TSDB metric samples and a "time" index column
def gen_df_w_tsdb_data(num_items=24, freq="1H", end=None, start=None,
                       start_delta=None, tz=None, normalize=False, zero=False,
                       attrs=["cpu", "mem", "disk"]):
    if (start is None and start_delta is not None and end is not None):
        start = end - timedelta(days=start_delta)
    if (zero):
        if (end is not None):
            end = end.replace(minute=0, second=0, microsecond=0)
        if (start is not None):
            start = start.replace(minute=0, second=0, microsecond=0)
    # If `start`, `end`, `num_items` (date_range() `periods`), and `freq`
    # are set, ignore `freq`
    if (freq is not None and start is not None and end is not None and
            num_items is not None):
        freq = None
    times = pd.date_range(periods=num_items, freq=freq, start=start, end=end,
                          tz=tz, normalize=normalize)
    data = np.random.rand(num_items, len(attrs)) * 100
    df = pd.DataFrame(data, index=times, columns=attrs)
    df.index.name = "time"
    return df

In [12]:
# Prepare DataFrames with randomly generated metric samples
end_t = datetime.now()
start_delta = 7  # start time = ent_t - 7 days
dfs = []
for i in range(4):
    # Generate a new DataFrame with TSDB metrics
    dfs.append(gen_df_w_tsdb_data(end=end_t, start_delta=7, zero=True))
    # Display DataFrame info & head (optional - for testing)
    # print("\n** dfs[" + str(i) + "] **")
    # display(dfs[i].info(), dfs[i].head())

In [13]:
# Write to a TSDB table

# Prepare metric labels to write
labels = [
    {"node": "11", "os": "linux"},
    {"node": "2", "os": "windows"},
    {"node": "11", "os": "windows"},
    {"node": "2", "os": "linux"}
]

# Write the contents of the prepared DataFrames to a TSDB table. Use multiple
# write commands with the `labels` parameter to set different label values.
num_dfs = len(dfs)
for i in range(num_dfs):
    client.write("tsdb", table=tsdb_table, dfs=dfs[i], labels=labels[i])

<a id="frames-tsdb-read"></a>
### Read from the TSDB Table

- [Overview and Basic Examples](#frames-tsdb-read-basic)
- [Conditional Read](#frames-tsdb-read-conditional)

<a id="frames-tsdb-read-basic"></a>
#### Overview and Basic Examples

Use the `read` method of the Frames client with the `tsdb` backend to read data from your TSDB table (i.e., query the database).<br>
You can perform one of two types of queries (but you cannot mix the two); note that you also cannot mix raw sample-data queries and aggregation queries:

- **A non-SQL query** &mdash; set the `table` parameter to the path to the TSDB table, and optionally set additional method parameters to configure the query.
  `columns` defines the query metrics (default = all); `aggregators` defines aggregation functions ("aggregators") to execute for all the configured metrics; `filter` restricts the query by using a platform [filter expression](https://www.iguazio.com/docs/reference/latest-release/expressions/condition-expression/#filter-expression); and `group by` allows grouping the results by specific metric labels.
- **An SQL query** \[Tech Preview\] &mdash; set the `query` parameter to an SQL query string of the following format:
  ```
  select <metrics | aggregators> from '<table path>' [where <filter expression>] [group by <labels>]
  ```
  > **Note:**
  > - In SQL queries, the path to the TSDB table is set in the `FROM` clause of the `query` string and not in the `read` method's `table` parameter.
  > - The `where` filter expression is similar to that passed to the `filter` parameter for a non-SQL query, except it's in SQL format, so the expression isn't embedded within quotation marks and comparisons are done by using the '`=`' operator instead of the '`==`' operator.
  > - The `select` clause can optionally include a comma-separated list of either over-time aggregators (such as `avg` or `sum`) or cross-series aggregators (such as `avg_all` or `sum_all`), but you cannot mix these aggregation types.
  >   The aggregation functions receive a metric-name parameter (for example, `avg(cpu)`, `avg_all(cpu)`, or `avg(*)` for all metrics).
  >   Cross-series aggregations functions can also optionally receive an interpolation function &mdash; `next` (default) | `prev` | `linear` | `none` &mdash; in which case the metric name is passed as a parameter of the interpolation function (and not as a direct parameter of the aggregation function); the interpolation function can also optionally receive an interpolation-tolerance string of the format `"[0-9]+[mhd]"` (for example, `avg_all(prev(cpu,'1h'))`).

For both types of queries, you can also optionally set additional parameters.
`start` and `end` define the query's time range &mdash; the metric-sample timestamps to which to apply the query (the default end time is `"now"` and the default start time is 1 hour before the end time); `step` defines the interval for aggregation or raw-data downsampling (default = the query's time range); and`aggregation_window` defines the aggregation time frame for over-time aggregation (default = `step`).<br>
You can set the optional `multi_index` parameter to `True` to return labels as index columns, as demonstrated in the following examples.
By default, only the metric sample-time primary-key attribute is returned as an index column.<br>
See the [Frames API reference](https://www.iguazio.com/docs/reference/latest-release/api-reference/frames/tsdb/read/) for more information about the `read` parameters that are supported for the `tsdb` backend.

In [14]:
# Read all metrics from the TSDB table (start="0"; default `end` time = "now")
# into a single DataFrame (default `Iterator`=False) and display the first 10
# items; show metric labels as index columns (multi_index=True)
df = client.read(backend="tsdb", table=tsdb_table, start="0", multi_index=True)
display(df.head(8))

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,cpu,disk,mem
time,os,node,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2020-01-07 13:00:00.000,linux,2,93.618128,45.093516,16.85631
2020-01-07 20:18:15.652,linux,2,76.274239,17.068743,2.761269
2020-01-08 03:36:31.304,linux,2,50.248348,11.062812,66.214595
2020-01-08 10:54:46.956,linux,2,10.433592,26.941096,23.913899
2020-01-08 18:13:02.608,linux,2,56.600875,33.884964,98.863358
2020-01-09 01:31:18.260,linux,2,32.95847,97.106561,63.253016
2020-01-09 08:49:33.913,linux,2,22.337103,48.128245,57.658456
2020-01-09 16:07:49.565,linux,2,18.384175,66.318552,19.707752


In [15]:
# Read the full table contents, as in the previous example but use an SQL query
query_str = f"select * from '{tsdb_table}'"
df = client.read(backend="tsdb", query=query_str, start="0", multi_index=True)
display(df.head(8))

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,disk,mem,cpu
time,node,os,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2020-01-07 13:00:00.000,2,linux,45.093516,16.85631,93.618128
2020-01-07 20:18:15.652,2,linux,17.068743,2.761269,76.274239
2020-01-08 03:36:31.304,2,linux,11.062812,66.214595,50.248348
2020-01-08 10:54:46.956,2,linux,26.941096,23.913899,10.433592
2020-01-08 18:13:02.608,2,linux,33.884964,98.863358,56.600875
2020-01-09 01:31:18.260,2,linux,97.106561,63.253016,32.95847
2020-01-09 08:49:33.913,2,linux,48.128245,57.658456,22.337103
2020-01-09 16:07:49.565,2,linux,66.318552,19.707752,18.384175


In [16]:
# Read over-time aggregates with a 1-hour aggregation step for all metric
# samples created in the last 2 days; use an SQL query (see `query`)
query_str = f"select avg(*), max(*), min(*) from '{tsdb_table}'"
df = client.read(backend="tsdb", query=query_str, step="1h", start="now-1d",
                 end="now", multi_index=True)
display(df)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,avg(mem),max(mem),min(mem),avg(disk),max(disk),min(disk),avg(cpu),max(cpu),min(cpu)
time,node,os,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2020-01-13 14:42:38,2,linux,57.823986,57.823986,57.823986,34.199523,34.199523,34.199523,95.923701,95.923701,95.923701
2020-01-13 21:42:38,2,linux,97.207454,97.207454,97.207454,15.008668,15.008668,15.008668,35.710565,35.710565,35.710565
2020-01-14 04:42:38,2,linux,45.19139,45.19139,45.19139,37.586579,37.586579,37.586579,28.403664,28.403664,28.403664
2020-01-14 12:42:38,2,linux,77.834833,77.834833,77.834833,97.161394,97.161394,97.161394,54.700933,54.700933,54.700933
2020-01-13 14:42:38,2,windows,13.342256,13.342256,13.342256,12.406936,12.406936,12.406936,85.230316,85.230316,85.230316
2020-01-13 21:42:38,2,windows,89.814865,89.814865,89.814865,49.239731,49.239731,49.239731,51.154003,51.154003,51.154003
2020-01-14 04:42:38,2,windows,47.846144,47.846144,47.846144,19.242988,19.242988,19.242988,12.892809,12.892809,12.892809
2020-01-14 12:42:38,2,windows,1.773818,1.773818,1.773818,84.338198,84.338198,84.338198,69.64409,69.64409,69.64409
2020-01-13 14:42:38,11,windows,77.003766,77.003766,77.003766,2.77842,2.77842,2.77842,70.17291,70.17291,70.17291
2020-01-13 21:42:38,11,windows,15.0793,15.0793,15.0793,32.163726,32.163726,32.163726,80.687366,80.687366,80.687366


In [17]:
# Perform a similar query as in the previous example but use a non-SQL query
# and group the results by the `os` label
df = client.read(backend="tsdb", table=tsdb_table, aggregators="avg, max, min",
                 step="1h", group_by="os", start="now-1d", end="now",
                 multi_index=True)
display(df)

Unnamed: 0_level_0,Unnamed: 1_level_0,avg(cpu),avg(disk),avg(mem),max(cpu),max(disk),max(mem),min(cpu),min(disk),min(mem)
time,os,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
2020-01-13 14:42:42,linux,77.224543,61.98131,35.037277,95.923701,89.763096,57.823986,58.525384,34.199523,12.250569
2020-01-13 21:42:42,linux,27.489223,9.373381,92.907386,35.710565,15.008668,97.207454,19.26788,3.738094,88.607318
2020-01-14 04:42:42,linux,40.788661,24.481574,40.40741,53.173659,37.586579,45.19139,28.403664,11.376569,35.62343
2020-01-14 12:42:42,linux,69.996349,50.780828,51.551604,85.291765,97.161394,77.834833,54.700933,4.400262,25.268376
2020-01-13 14:42:42,windows,77.701613,7.592678,45.173011,85.230316,12.406936,77.003766,70.17291,2.77842,13.342256
2020-01-13 21:42:42,windows,65.920685,40.701729,52.447083,80.687366,49.239731,89.814865,51.154003,32.163726,15.0793
2020-01-14 04:42:42,windows,41.4531,37.395192,40.09337,70.013391,55.547395,47.846144,12.892809,19.242988,32.340596
2020-01-14 12:42:42,windows,45.323353,55.14209,36.403185,69.64409,84.338198,71.032552,21.002616,25.945982,1.773818


<a id="frames-tsdb-read-conditional"></a>
#### Conditional Read

The following examples demonstrate how to use a query filter to conditionally read only a subset of the data from a TSDB table.<br>

- In non-SQL queries, this is done by setting the value of the `filter` parameter to a [platform filter expression](https://www.iguazio.com/docs/reference/latest-release/expressions/condition-expression/#filter-expression).
- In SQL queries, this is done by setting the `query` parameter to a query string that includes a `FROM` clause with a platform filter expression expressed as an SQL expression.
  Note that the comparison operator for such queries is `=`, as opposed to `==` in non-SQL queries.

In [18]:
# Read over-time aggregates with a 1-day aggregation step for all metric
# samples in the table with the `os` label "linux" and the `node` label 11.
df = client.read(backend="tsdb", table=tsdb_table, aggregators="count,sum",
                 step="1d", start="0", filter="os=='linux' and node=='11'",
                 multi_index=True)
display(df)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,count(cpu),count(disk),count(mem),sum(cpu),sum(disk),sum(mem)
time,os,node,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2020-01-07,linux,11,2.0,2.0,2.0,122.28537,159.920638,140.459197
2020-01-08,linux,11,3.0,3.0,3.0,187.262407,112.0255,93.968675
2020-01-09,linux,11,4.0,4.0,4.0,109.733086,175.528341,148.669437
2020-01-10,linux,11,3.0,3.0,3.0,166.315779,168.336972,189.892164
2020-01-11,linux,11,3.0,3.0,3.0,209.953187,216.125313,123.822148
2020-01-12,linux,11,3.0,3.0,3.0,183.512749,200.427571,163.358412
2020-01-13,linux,11,4.0,4.0,4.0,136.488228,265.328056,189.332497
2020-01-14,linux,11,2.0,2.0,2.0,138.465424,15.776831,60.891805


In [19]:
# Read over-time aggregates with an half-hour step for mem` metric samples
# created yesterday with the `os` label "windows" and the `node` label 2, and
# group the results by the `node` label; use an SQL query
query_str = f"select count(mem), sum(mem) from '{tsdb_table}' " + \
    "where os='windows' and node='2' group by node"
df = client.read(backend="tsdb", query=query_str, step="15m",
                 start="now-1d", multi_index=True)
display(df)

Unnamed: 0_level_0,Unnamed: 1_level_0,count(mem),sum(mem)
time,node,Unnamed: 2_level_1,Unnamed: 3_level_1
2020-01-13 14:57:51,2,1.0,13.342256
2020-01-13 22:12:51,2,1.0,89.814865
2020-01-14 05:27:51,2,1.0,47.846144
2020-01-14 12:57:51,2,1.0,1.773818


<a id="frames-tsdb-delete"></a>
### Delete the TSDB Table

Use the `delete` method of the Frames client with the `tsdb` backend to delete the TSDB table that was used in the previous steps.

In [20]:
client.delete("tsdb", tsdb_table)

<a id='frames-stream'></a>
## Working with Streams (stream Backend)

The platform supports streams that have an AWS Kinesis-like API. For more information, see the [platform documentation](https://www.iguazio.com/docs/concepts/latest-release/streams/).
<br>
This section demonstrates how to use the `streams` Frames backend to work with streams in the platform.

- [Initialization](#frames-stream-init)
- [Create a Stream](#frames-stream-create)
- [Write to the Stream](#frames-stream-write)
  - [Use the Write Method to Perform a Batch Update](#frames-stream-write-batch-update)
  - [Use the Execute Method's Put Command to Update a Single Record](#frames-stream-execute-put)
- [Read from the Stream](#frames-stream-read)
- [Delete the Stream](#frames-tsdb-delete)

<a id="frames-stream-init"></a>
### Initialization

Start out by defining a stream-path variable that will be used in the tutorial's code examples.<br>
The stream path (`strm`) is relative to the configured parent data container; see [Create a Stream](#frames-stream-create).

In [21]:
# Relative path to the stream within the parent platform data container
strm = os.path.join(os.getenv("V3IO_USERNAME"), "examples", "somestream")

<a id="frames-stream-create"></a>
### Create a Stream

Use the `create` method of the Frames client with the `stream` backend to create a new data stream.<br>
The mandatory `table` parameter specifies the relative stream path within the data container that was configured for the Frames client (see the [main initialization](#frames-init) step).
In the following example, the relative stream path is set by using the `strm` variable that was defined in the [stream backend initialization](#frames-stream-init) step.<br>
You can optionally provide additional arguments.
For example, you can set the `shards` argument to the number of shards in the stream, or you can set the `retention_hours` argument to the stream's retention period in hours.

In [22]:
# Create a new stream
client.create(backend="stream", table=strm, retention_hours=48, shards=1)

<a id="frames-stream-write"></a>
### Write to the Stream

You can use either of the following methods to ingest data into your stream:

- [Use the Write Method to Perform a Batch Update](#frames-stream-write-batch-update)
- [Use the Execute Method's Put Command to Update a Single Record](#frames-stream-execute-put)

<a id="frames-stream-write-batch-update"></a>
#### Use the Write Method to Perform a Batch Update

Use the `write` method of the Frames client with the `stream` backend to ingest multiple records into your stream (batch update), as demonstrated in the following example.<br>
The `dfs` parameter can be set either to a single DataFrame (as done in the following example) or to multiple DataFrames &mdash; either as a DataFrames iterator or as a list of DataFrames.

In [23]:
# Prepare the ingestion data
import numpy as np
num_records = 7
attrs = ["cpu", "mem", "disk"]
df = pd.DataFrame(np.random.rand(num_records, len(attrs)) * 100, columns=attrs)
# Display DataFrame info & content (optional - for testing)
# display(df.info(), df)

# Ingest data into the stream
client.write("stream", table=strm, dfs=df)

<a id="frames-stream-execute-put"></a>
#### Use the Execute Method's Put Command to Update a Single Record

Use the `put` command of the `execute` method of the Frames client with the `stream` backend to add a single record to a stream.<br>
Use the `args` parameter of the `put` command to provide the necessary information:
set the mandatory `data` argument to the ingested record data.
You can optionally set the `client_info` argument to additional metadata and the `partition` argument to a partition key; records with the same partition key are assigned to the same shard.

In [24]:
client.execute('stream', table=strm, command='put',
               args={'data': '{"cpu": 12.4, "mem": 31.1, "disk": 12.7}'})

<a id="frames-stream-read"></a>
### Read from the Stream

Use the `read` method of the Frames client with the `stream` backend to read data from your stream.<br>
The mandatory `seek` parameter specifies the seek method, which determines the location within the target stream shard from which to read; some methods require setting additional parameters:

- `"earliest"` &mdash; start from the earliest point in the shard; (no additional parameters).
- `"latest"` &mdash; start from the latest location in the shard (i.e., consume only new records).
- `"time"` &mdash; start from a specific point in time, as specified in the `start` parameter (for example, `start="now-1d"`).
- `"sequence"` &mdash; start from a specific record sequence number, as specified in the `sequence` parameter (for example, `sequence=45`).

The `read` method can return a single DataFrame (default) or a DataFrames iterator (a stream) if the `iterator` parameter is set to `True`, as demonstrated in the following example.

In [25]:
# Read from the from the earliest available location (seek="earliest") in the first stream shard (shard_id=0);
# return the result as a DataFrames iterator (iterator=True) and iterate and print the returned data
dfs = client.read("stream", strm, seek="earliest", shard_id="0", iterator=True)
for df in dfs:
    display(df)

Unnamed: 0_level_0,cpu,disk,mem,stream_time
seq_number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,34.244479,60.44541,83.670392,2020-01-14 13:43:33.320549541
2,62.075647,15.088883,57.285407,2020-01-14 13:43:33.320549541
3,58.084446,46.492305,34.355042,2020-01-14 13:43:33.320549541
4,51.879229,21.485869,45.639052,2020-01-14 13:43:33.320549541
5,90.975499,5.348235,9.850508,2020-01-14 13:43:33.320549541
6,43.596282,83.00803,51.339942,2020-01-14 13:43:33.320549541
7,87.408042,4.624828,28.251473,2020-01-14 13:43:33.320549541
8,12.4,12.7,31.1,2020-01-14 13:43:59.306667440


<a id="frames-tsdb-stream"></a>
### Delete the Stream

Use the `delete` method of the Frames client with the `stream` backend to delete the TSDB table that was used in the previous steps.

In [26]:
client.delete("stream", strm)

<a id="frames-cleanup"></a>
## Cleanup

You can optionally delete any of the directories or files that you created.
See the instructions in the [Creating and Deleting Container Directories](https://www.iguazio.com/docs/tutorials/latest-release/getting-started/containers/#create-delete-container-dirs) tutorial.
For example, the following code uses a local file-system command to delete the entire **&lt;running user&gt;/examples/** directory in the "users" container.
Edit the path, as needed, then remove the comment mark (`#`) and run the code.

In [None]:
#!rm -rf /User/examples/