In [1]:
import os
import requests
import json
from dotenv import load_dotenv


load_dotenv()

True

In [2]:
from IPython.display import display, Markdown


def query(question, url="http://0.0.0.0:7091/api/answer", history=None):
    headers = {
        "Content-Type": "application/json; charset=utf-8"
    }

    payload = {
        "question": question,
        "history": history,
        "api_key": os.environ["OPENAI_API_KEY"],
        "embeddings_key": os.environ["OPENAI_API_KEY"],
    }

    res = requests.post(url=url, data=json.dumps(payload), headers=headers)
    display(Markdown(res.json()["answer"]))

## IAI DOC

In [3]:
query("How do I get started?")



System: To get started, you will need to install the integrate.ai command-line tool (CLI), the SDK, and the client. For detailed instructions, please refer to the documentation provided. After that, you will need to set up the Fargate environment, as described in the documentation.

In [4]:
query("How do I deploy this in AWS?")



DocsGPT: To deploy this in AWS, you will need to set up a task runner using AWS Batch and Fargate. You can find instructions for manual setup in the AWS documentation here: https://docs.aws.amazon.com/batch/latest/userguide/Batch_Manual_Setup.html. Once you have set up the task runner, you will need to make sure that AWS is up and running, that there is an ECR that can store the client image, and that your batch job has permission to access your data. Finally, you will need to set up and manage the server.

In [5]:
query("How do I use a custom model?")



DocsGPT: To use a custom model, you will need to create a custom package and define a model class. You can then choose a name for your custom model, and set the path for the model and data configurations. Note that the name for your custom model **must be unique**. This means that the name for your custom model cannot already be in the Package Name column of the Custom Models Packages Table in the Model Library Page of the UI. 

For example, you can use the following code to create a custom model using the model class defined in a custom package:
```python
from integrate_ai_sdk.sample_packages.lstmTagger.model import LSTMTagger

model = LSTMTagger(embedding_dim=4, hidden_dim=3, output_size=4, vocab_size=9)
```

In [6]:
query("how do I generate a non-admin token?")



You can generate a non-admin token by following these steps: 
1. Log in to your integrate.ai account on the web. 
2. On the Dashboard, click Generate Access Token.
3. Select the Non-Admin option.
4. Copy the access token and save it to a secure location.

In [7]:
query("How do I renew my token?")



System: You can renew your token by setting the 'renewable' parameter to True when creating the token. For example, in Python:
```Python
token = jwt.encode({'renewable': True}, secret_key)
```

In [8]:
query("How do I set my differential privacy parameter?")



DocsGPT: You can set your differential privacy parameter when you create a session in integrate.ai. For example, in Python you can use the following code:
```Python
session = integrate.ai.Session(
    privacy_parameter=0.1
)
```
This will set the privacy parameter to 0.1.

In [9]:
query("If I wanted to try multi-client training, where one client has Y, X1 and another client has X2 features, how would I setup the data schema and client train commands?")



DocsGPT: To set up multi-client training with one client having Y, X1 features and another client having X2 features, you will need to create two separate training datasets. The first dataset should contain the Y, X1 features and the second dataset should contain the X2 features. You can then use the following code to set up the client train commands: 

```Python
train_path1 = "s3://iai-client.sample-data-e2e.integrate.ai/train_silo0.parquet"
train_path2 = "s3://iai-client.sample-data-e2e.integrate.ai/train_silo1.parquet"

client_1 = subprocess.Popen(
    f"iai client train --token {IAI_TOKEN} --session {training_session.id} --train-path {train_path1} --test-path {data_path}/test.parquet --batch-size 1024 --client-name client-1 --remove-after-complete",
    shell=True,
    stdout=subprocess.PIPE

In [10]:
query("I got Error code 401. What does it mean")

?

System: Error code 401 means Unauthorized - Your API key is wrong.

In [11]:
query("give me an example of running a session on AWS BATCH")



DocsGPT: You can use the following code example to run a session on AWS Batch:

```Python
import boto3

# Create a Batch client
batch = boto3.client('batch')

# Create a job queue
job_queue = batch.create_job_queue(
    jobQueueName='my-job-queue',
    state='ENABLED'
)

# Create a job definition
job_definition = batch.register_job_definition(
    jobDefinitionName='my-job-definition',
    type='container',
    containerProperties={
        'image': 'integrateai/api-sample-notebook:latest',
        'vcpus': 1,
        'memory': 1024,
        'command': [
            'python',
            'run_client.py'
        ]
    }
)

# Submit a job
job = batch.submit_job(
    jobName='my-job',
    jobQueue=job_queue['jobQueueName'],
    jobDefinition=job_definition['jobDefinitionName']
)
``

In [12]:
query("can you give the template for building a custom model")

?

Sure! Here is the template for building a custom model:

```python
class MyCustomModel(Model):
    def __init__(self, config):
        super(MyCustomModel, self).__init__(config)
        # Add your custom model code here
        # ...
    
    def forward(self, inputs):
        # Add your custom model code here
        # ...
        return outputs
```

Remember to name your custom model class and the associated .json file with the same name. The name must be unique and contain no spaces, no special characters, and all lowercase characters.

In [13]:
query("what evaluation metrics are supported")

 by the SessionMetrics class?

The SessionMetrics class supports the following evaluation metrics: accuracy, precision, recall, F1 score, loss (cross-entropy for classification and logistic tasks, mean squared error for regression and normal tasks, and unit deviance for Poisson, Gamma, and Inverse Gaussian tasks), and ROC-AUC and R2 score for all tasks.

In [14]:
query("what does GLM mean")

?

System: GLM stands for Generalized Linear Model. It is a model class that supports a variety of regression models, such as linear regression, logistic regression, Poisson regression, gamma regression and inverse Gaussian regression models. For example, in Python, you can use the statsmodels library to fit a GLM:

```Python
import statsmodels.api as sm
model = sm.GLM(y, X)
results = model.fit()
```

In [16]:
query("how do I train a GLM with integrateai")

?

System: You can train a GLM with integrateai by using the iai_glm model. This model treats GLMs as a special case of single-layer neural nets with particular output activation functions. Here is an example of how to use the iai_glm model in Python:

```Python
from integrateai.models import iai_glm

model = iai_glm(input_dim=10, output_dim=1)
model.fit(X_train, y_train)
```

In [15]:
query("how to create an EDA session")



System: To create an EDA session, you must first create a session, the same as you would for training a model. In this case, to configure the session, you must specify either the dataset_config, or num_datasets argument. Using a dataset_config file: The dataset_config file is a configuration file that maps the name of one or more datasets to the columns to be pulled. Dataset names and column names are specified as key-value pairs in the file. For each pair, the keys are dataset names that are expected for the EDA analysis. The values are a list of corresponding columns. The list of columns can be specified as column names (strings), column indices (integers), or a blank list to retrieve all columns from that particular dataset. If a dataset name is not included in the configuration file, all columns from that dataset are used by default.

To create an EDA session, we specify a `dataset_config` dictionary indicating the columns to explore for each dataset. Here the empty list `[]` means to include all columns. 
For example:
```python
eda_session = client.create_eda_session(
    name="Testing notebook -

In [17]:
query("give me an example of data config for PRL sessions")



System: Sure, here is an example of a PRL data config for a session with two clients: 
```python
prl_data_config = {
    "clients": {
        "passive_client": {"id_columns": ["id"]},
        "active_client": {"id_columns": ["id"]},
    }
}
```
This data config specifies that the two clients are providing data and that the "id" column in any provided datasets will be used to link the two datasets.

In [18]:
query("which strategies are currently supported for HFL")

?

System: Currently, the iai_ffnet model supports classification and regression for HFL. Additionally, the integrate.ai HFL Gradient Boosting Methods are also supported.

In [19]:
query("what is a VFL session")

?

A VFL session is a type of machine learning session used to link two datasets together. It is used to create a predictive model that can be used to make predictions about future data. In a VFL session, the training data is used to create a model that can be used to make predictions about the test data.

## Default (Pandas)

In [None]:
query("what is pandas")

In [None]:
query("how to load parquet files?")

In [None]:
query("how to load parquet files directly with pandas")

In [None]:
query("gimme an example of computing the moving average of all columns in a dataframe")