## Tapis PEARC20 Demo

In this notebook, we use Tapis to store and analyze streaming data generated from code simulating a sensor. We introduce a number of Tapis services and concepts along the way.

![Alt text](images/tapis_demo_overview.png "Tapis Demo Overview")

### Tapis Python SDK, Tenants and Authentication

In this notebook, we will use the official Tapis Python SDK for all of our interactions with the services. The Python SDK provides Python-native methods and objects for making HTTP requests and parsing HTTP responses to and from the Tapis API. 

In order to do just about anything with Tapis, we will need to authenticate. Tapis makes heavy use of the notion of "tenants" in order to provide isolation for different projects. By setting the base_url variable, you indicate to the Tapis SDK which tenant you wish to interact with.

For the demo, we will be using the "dev" tenant which has a base URL of "https://dev.tapis.io". This tenant is an internal tenant the Tapis core development team can use to try out services. The demo will make use of two test accounts in the dev tenan.

The "TACC tenant", with base URL "https://tacc.tapis.io", allows individuals to authenticate using any valid TACC account. For other tenants, the authentication rules could be different. 

Authentication in the "dev" and "TACC" tenants use OAuth2 (again, this could be different in other tenants), but the Tapis Python SDK simplifies some of the complexity inherent in OAuth2 by providing some convenience functions for common use cases. For example, we are able to generate an access token using just our username and password via the convenience function “get_tokens()”. We do this below:

In [None]:
#Set TACC account credentials for authenticating to the Tapis TACC Tenant 
import getpass
permitted_username = getpass.getpass(prompt='Username: ', stream=None)
permitted_user_password = getpass.getpass(prompt='Password: ', stream=None)

#Set Tapis Tenant and Base URL
tenant="dev"
base_url = 'https://dev.develop.tapis.io'

#Load Python SDK
from tapipy.tapis import Tapis

#Create python Tapis client for user
permitted_client = Tapis(base_url= base_url, 
                         username=permitted_username,
                         password=permitted_user_password, 
                         account_type='user', 
                         tenant_id=tenant,
                        ) 

#Generate an Access Token that will be used for all API calls
permitted_client.get_tokens()

In Tapis, access tokens (and refresh tokens) are simply JSON Web Tokens (JWTs). The access_token Python object created and managed by the Python SDK has attributes on it that include the "raw" JWT string as well as claims associated with the JWT. Services use the claims to determine what actions a user is authorized to take. In particular, the "sub" (subject) claim uniquely identifies a user inside Tapis. 

In [None]:
#Show the access token object generated
permitted_client.access_token

Note also the ttl (time-to-live) claim; Tapis tokens have a finite lifetime, typically a few hours, configurable by tenant. After the token expires, we will need to get a new token in order to continue interacting with Tapis. The Python SDK has convenience methods for managing tokens and even automatically refreshing a token.

## Streams API

![Alt text](images/streams-api.png "a title")

In [None]:
#Setup Streams Variables that are used in the rest of the notebook
import datetime
storage_id = "tapis-demo"
project_id ='wq_demo_tapis_streams_proj'+ str(datetime.datetime.today().isoformat())
site_id = 'wq_demo_site'
instrument_id = instrument_id = 'Ohio_River_Robert_C_Byrd_Locks'+ str(datetime.datetime.today().isoformat()).replace(':','_').replace('.','-')

### Project and Metadata Setup
Projects are defined at a top level in the hierarchy of Streams resources. A user registers a project by providing metadata information such as the principal Investigator, project URL, funding resource, etc. A list of authorized users can be added to various project roles to have a controlled access over the project resources. When a project is first registered, a collection is created in the back-end MongoDB. User permissions to access this collection are then set up in the security kernel. Every request to access the project resource or documents within (i.e sites, instruments, variables) goes through a security kernel check and only the authorized user requests are allowed to be processed.

In [None]:
## Create Streams Project
result, debug = permitted_client.streams.create_project(project_name=project_id,
                                                        description='project for early adopters demo',
                                                        owner='testuser2', pi='ajamthe', 
                                                        funding_resource='tapis', 
                                                        project_url='test.tacc.utexas.edu',
                                                        active=True,_tapis_debug=True)
print(result)

![Alt text](images/stream-mongo.png "a title")

#### Create Site
A site is a geographical location that may hold one or more instruments. Sites are next in the streams hierarchy and they inherit permissions from the projects. Project owners can create sites by providing the geographical information such as latitude, longitude and elevation of the site or GeoJSON encoded spatial information. This spatial information is useful when searching sites or data based on location. In the back-end database a site is represented as a JSON document within the project collection. Site permissions are inherited from the project. 


In [None]:
## Create a Streams Site
result, debug = permitted_client.streams.create_site(project_id=project_id,
                                                     request_body=[{
                                                     "site_name":site_id, 
                                                     "site_id":site_id,
                                                     "latitude":50, 
                                                     "longitude":10, 
                                                     "elevation":2,
                                                     "description":'test_site'
                                                    }], _tapis_debug=True)
print(result)

In [None]:
#Edit Site
result, debug = permitted_client.streams.update_site(project_id=project_id,
                                             site_name=site_id, 
                                             site_id=site_id,
                                             latitude=90, 
                                             longitude = 90, 
                                             elevation=2,
                                             description='edited_site', _tapis_debug=True)
print(result)

In [None]:
#Create sites in bulk
result, debug = permitted_client.streams.create_site(project_id=project_id,request_body=[{
                                                     "site_name":site_id+"_2", 
                                                     "site_id":site_id+"_2",
                                                     "latitude":50, 
                                                     "longitude":10, 
                                                     "elevation":2,
                                                     "description":'test_site2'
                                                    },{
                                                     "site_name":site_id+'_3', 
                                                     "site_id":site_id+'_3',
                                                     "latitude":50, 
                                                     "longitude":10, 
                                                     "elevation":2,
                                                     "description":'test_site3'
                                                    }
                                                    ], _tapis_debug=True)
print(result)

#### Create Instrument
Instruments are physical entities that may have one or more embedded sensors to sense various parameters such as temperature, relative humidity, specific conductivity, etc. These sensors referred to as variables in Streams API generate measurements, which are stored in the influxDB along with a ISO8601 timestamp. Instruments are associated with specific sites and projects. Information about the instruments such as site and project ids, name and description of the instrument, etc. are stored in the mongoDB sites JSON document. 

In [None]:
## Create Instruments
result, debug = permitted_client.streams.create_instrument(project_id=project_id,site_id=site_id,
                                                           request_body=[{
                                                           "inst_name":instrument_id,
                                                           "inst_description":"demo instrument",
                                                           "inst_id":instrument_id
                                                           }], _tapis_debug=True)
print(result)

In [None]:
#Edit Instrument
result, debug = permitted_client.streams.update_instrument(project_id=project_id,
                                                           topic_category_id ='2',
                                                           site_id=site_id,
                                                           inst_name=instrument_id,
                                                           inst_description='edited instrument',
                                                           inst_id=instrument_id, _tapis_debug=True)
print(result)

In [None]:
## Create Instruments
result, debug = permitted_client.streams.create_instrument(project_id=project_id,site_id=site_id,
                                                           request_body=[{
                                                           "inst_name":instrument_id+'_2',
                                                           "inst_description":"demo instrument",
                                                           "inst_id":instrument_id+'_2'
                                                           },{
                                                           "inst_name":instrument_id+'_3',
                                                           "inst_description":"demo instrument",
                                                           "inst_id":instrument_id+'_3'
                                                           }], _tapis_debug=True)
print(result)

#### Create Variables
Variables are associated with specific instruments. When a variable is created the users provide information such as the name of variable, properties measured, units of measurements, etc. For example, a variable for temperature sensor when created can store measurements in degree Celsius or Fahrenheit.

In [None]:
#Create variables in bulk
result, debug = permitted_client.streams.create_variable(project_id=project_id,
                                                         site_id=site_id,
                                                         inst_id=instrument_id,
                                                         request_body=[
                                                         {
                                                         "topic_category_id" :"2",
                                                         "var_name":"temperature", 
                                                         "shortname":"temp","var_id":"temp"
                                                         },
                                                         {
                                                          "topic_category_id" :"2",
                                                         "var_name":"ph_level", 
                                                         "shortname":"ph","var_id":"ph"
                                                         },{
                                                          "topic_category_id" :"2",
                                                         "var_name":"battery", 
                                                         "shortname":"batv","var_id":"batv"
                                                         },{
                                                         "topic_category_id" :"2",
                                                         "var_name":"turbidity", 
                                                         "shortname":"turb","var_id":"turb"
                                                         },{
                                                         "topic_category_id" :"2",
                                                         "var_name":"specific_conductivity", 
                                                         "shortname":"spc","var_id":"spc"
                                                         }
                                                         ],_tapis_debug=True)
print(result)

### Write Measurements
Measurements are actual values from the variables, which are stored in the time series database influxDB. Project owners or users can download these measurements by providing a time window of measurement creation and retrieve the data in the CSV or JSON format. This data  can be processed in real time with the help of the Channels API.

In [None]:
#Write Measurements - this is our sensor simulator
from datetime import datetime
import random
from random import randint
variables = []
#generate 10 sensor records
for i in range(0, 10):
    datetime_now = datetime.now().isoformat()
    variables.append({"temp": randint(85, 89),
                        "spc": randint(240, 300),
                        "turb": randint(10, 19),
                        "ph": randint(1, 10),
                        "batv": round(random.uniform(10, 13), 2),
                        "datetime":datetime_now
                        })
#write observations to measurements endpoint for our instrument
result = permitted_client.streams.create_measurement(inst_id=instrument_id, vars=variables)
print(result)

### Download Measurements
Download the measurements we just created from our instrument.

In [None]:
#Download measurments as CSV
result = permitted_client.streams.list_measurements(inst_id=instrument_id,
                                                    project_id=project_id, 
                                                    site_id=site_id,
                                                    start_date='2021-01-01T00:00:00Z',
                                                    end_date='2025-12-30T22:19:25Z',
                                                    format='csv')
result

In [None]:
#Read Measurements to Data Frame
import pandas as pd
from io import StringIO
input = StringIO(str(result,'utf-8'))
df = pd.read_csv(input)
df['datetime']=pd.to_datetime(df['time'])
df.set_index('datetime',inplace=True)
df.pop('time')
df

In [None]:
# Plot Measurements in the DataFrame
import matplotlib.pyplot as plt
import matplotlib.dates as md
%matplotlib inline
xfmt = md.DateFormatter('%H:%M:%S')
df.plot(lw=1, colormap='jet', marker='.', 
        markersize=12, title='Timeseries Stream Output', rot=90).xaxis.set_major_formatter(xfmt)
plt.tight_layout()
plt.legend(loc='best')
plt.savefig('test.png')

## Transfer data to a storage system

We can take the measurment data we have just written and send it as a csv or json file to a Tapis storage system

In [None]:
result = permitted_client.streams.transfer_data(filename="mytestfile.csv",
                                       system_id= storage_id,
                                       path="/test-directory-e2e/",
                                       inst_id=instrument_id,
                                       data_format="csv",
                                       start_date="2021-01-01T00:00:00Z",
                                       end_date="2025-12-30T22:19:25Z"
                                      )
print(result)

In [None]:
#View files on the storage system we just transfered our instrument measurements too
permitted_client.files.listFiles(systemId=storage_id,path="/")

In [None]:
permitted_client.base_url = "http://192.168.56.1:5001"

## Archive or backup a Project

We can create backups of a streams projects data and metadata and send it a Tapis storage system. Currenlty archives happen on a one-time basis but in future releases a cron-like schedule within streams will be made available.

In [None]:
#Create a Project Archive
result = permitted_client.streams.archive_project(project_id=project_id,
                                                  archive_type='system',
                                                  owner='sean',
                                                  settings={
                                                    "system_id":storage_id,
                                                    "path":"/test-directory-e2e/",
                                                    "archive_format":"zip",
                                                    "data_format":"csv",
                                                    "frequency": "one-time"
                                                  })
print(result)

In [None]:
#View files on the storage system we just archived too
permitted_client.files.listFiles(systemId=storage_id,path="/")

In [None]:
#list project archives
result = permitted_client.streams.list_archives(project_id=project_id)
print(result)