In [1]:
import os
import requests
import json
from dotenv import load_dotenv


load_dotenv()

True

In [28]:
with open("application/prompts/history_questions.txt", "r") as f:
    history_questions = f.readlines()

with open("application/prompts/history_answers.txt", "r") as f:
    history_answers = f.readlines()

hist = []
for i in range(len(history_questions)):
    hist.append({"prompt": history_questions[i].strip(), "response": history_answers[i]})

In [29]:
hist

[{'prompt': 'How do I get started?',
  'response': '# Install integrate.ai SDK and Client\\n\\n## Generate an access token\\n\\nTo install the client and SDK, you must generate an access token through the web portal. \\n1. Log in to your workspace through the portal.\\n2. On the Dashboard, click Generate Access Token.\\n3. Copy the acccess token and save it to a secure location.\\n\\nImportant: This is the only time that the API token can be viewed or downloaded. If you lose or forget your API token, you cannot retrieve it. Instead, create a new API token and revoke the old one. You can manage API tokens through the web portal. \\n\\nTreat your API tokens like passwords and keep them secret. When working with the API, use the token as an environment variable instead of hardcoding it into your programs. In this documentation, the token is referenced as `<IAI_TOKEN>`.\\n\\n## Download the sample notebook\\n\\n\\n## Install integrate.ai packages\\n\\n1. At a command prompt on your machine

In [30]:
from IPython.display import display, Markdown

def query(question, url="http://0.0.0.0:7091/api/answer", history=hist):
    headers = {
        "Content-Type": "application/json; charset=utf-8"
    }

    payload = {
        "question": question,
        "history": history,
        "api_key": os.environ["OPENAI_API_KEY"],
        "embeddings_key": os.environ["OPENAI_API_KEY"],
    }

    res = requests.post(url=url, data=json.dumps(payload), headers=headers)
    display(Markdown(res.json()["answer"]))

## IAI DOC

In [5]:
query("How do I get started?")


AI: To get started, you will need to install the integrate.ai SDK and Client. 

1. Generate an access token through the web portal. 
2. Download the sample notebook. 
3. Install the integrate.ai packages using the access token. 
4. *Optional*: If you are building a model for data that includes images or video, follow the steps below for Setting up a Docker GPU Environment. 
5. Create a custom model package. 
6. Train the model. 
7. Deploy the model. 
8. Test the model.

In [10]:
query("How do I use a custom model?")


AI: To use a custom model, you must first create a custom model package. This package contains the model class, the model configuration, and the data configuration. 

1. Choose a name for your custom model, and set the path for the model and data configurations. Note that the name for your custom model **must be unique**. This means that the name for your custom model cannot already be in the Package Name column of the Custom Models Packages Table in the Model Library Page of the UI.

2. Create a model class that inherits from the Model class. This class should contain the model architecture and the methods for training, evaluating, and predicting. 

```python
from integrate_ai_sdk.sample_packages.lstmTagger.model import LSTMTagger

model = LSTMTagger(embedding_dim=4, hidden_dim=3, output_size=4, vocab_size=9)
```

3. Create a model configuration file that contains the model parameters and hyperparameters. 

4. Create a data configuration file that contains the data paths and data preprocessing

In [14]:
query("how do I generate a non-admin token?")


AI: To generate a non-admin token, you must first create a user in the workspace. 

1. Log in to your workspace through the portal.
2. On the Dashboard, click Users.
3. Click Add User.
4. Enter the user's email address and select the appropriate role.
5. Click Generate Access Token.
6. Copy the access token and save it to a secure location.

The user can now use the token to access the workspace.

In [18]:
query("How do I renew my token?")


AI: To renew your token, create a new API token and revoke the old one. You can manage API tokens through the web portal. 

1. Log in to your workspace through the portal.
2. On the Dashboard, click Generate Access Token.
3. Copy the acccess token and save it to a secure location.
4. Revoke the old token.
5. Use the new token for all future requests.

In [19]:
query("How do I set my differential privacy parameter?")


AI: Differential privacy parameters can be specified during session creation, within the model configuration. 

The following code example shows how to set the differential privacy parameter for a session: 

```python
session_config = SessionConfig(name="My Session",
                               differential_privacy_parameters=DifferentialPrivacyParameters(epsilon=0.1))
```

The `epsilon` parameter is the privacy budget, which is the maximum amount of privacy loss that is allowed. The higher the value, the less privacy is preserved.

In [20]:
query("""What does this error mean:
```2023-05-24 02:19:25,244 FLOUR MainThread ERROR | neural_net.py:215 | <_MultiThreadedRendezvous of RPC that terminated with:
	status = StatusCode.UNKNOWN
	details = "Error in service handler!"
	debug_error_string = "UNKNOWN:Error received from peer ipv4:99.79.32.55:9999 {grpc_message:"Error in service handler!", grpc_status:2, created_time:"2023-05-24T02:19:25.193640106+00:00"}"
>
05/24/2023 02:19:25:ERROR:<_MultiThreadedRendezvous of RPC that terminated with:
	status = StatusCode.UNKNOWN
	details = "Error in service handler!"
	debug_error_string = "UNKNOWN:Error received from peer ipv4:99.79.32.55:9999 {grpc_message:"Error in service handler!", grpc_status:2, created_time:"2023-05-24T02:19:25.193640106+00:00"}"
>
CLI0006: Unable to join server for Session 'dd5a90b700'```
""")


AI: This error indicates that there was an error in the service handler. This could be due to a misconfiguration of the service, or an issue with the network connection. Check the network connection and the service configuration to ensure that everything is set up correctly. If the issue persists, contact your system administrator for assistance.

In [21]:
query("""What does this error mean:
```An error occurred (InvalidAccessKeyId) when calling the GetObject operation: The AWS Access Key Id you provided does not exist in our records.```
""")

AI: This error indicates that the AWS Access Key ID you provided is invalid or does not exist in the AWS records. Please check that the Access Key ID is correct and that it is associated with the correct AWS account.

In [22]:
query("What's the best way to get the model weights from an HFL logistic regression analysis?")


AI: The best way to get the model weights from an HFL logistic regression analysis is to use the `get_weights()` method. This method returns a dictionary containing the weights for each feature in the model. 

```python
weights = model.get_weights()
```

In [23]:
query("If I wanted to try multi-client training, where one client has Y, X1 and another client has X2 features, how would I setup the data schema and client train commands?")


AI: To set up multi-client training, you will need to split your data into two silos, one for each client. Each silo should contain the features that the client will use for training. For example, if one client has Y, X1 and the other has X2, the first silo should contain Y and X1, and the second silo should contain X2. 

You can then use the following commands to set up the clients for training: 
```python
train_path1 = "s3://iai-client.sample-data-e2e.integrate.ai/train_silo0.parquet"
train_path2 = "s3://iai-client.sample-data-e2e.integrate.ai/train_silo1.parquet"

client_1 = subprocess.Popen(
    f"iai client train --token {IAI_TOKEN} --session {training_session.id} --train-path {data_path}/train_silo0.parquet --test-path {data_path}/test.parquet --batch

In [25]:
query("give me an example of running a session on AWS BATCH")


AI: You can use the integrate.ai API Sample Notebook to run a session on AWS Batch. The notebook provides an example of how to use the integrate.ai SDK to run a session on AWS Batch. 

To run the session on AWS Batch, you need to create a job queue and a compute environment. The job queue defines the resources that will be used to run the session, and the compute environment defines the compute resources that will be used to run the session. 

Once the job queue and compute environment are created, you can submit the session to AWS Batch. The session will be run on the compute resources defined in the compute environment. 

The following code example shows how to submit a session to AWS Batch: 

```python
# Create a job queue
job_queue = batch_client.create_job_queue(job_queue_name='my-job-queue')

# Create a compute environment
compute_environment = batch_client.create_compute_environment(compute_environment_name='my-compute-environment', job_queue_name=job_queue.job_

In [26]:
query("can you give the template for building a custom model")

?
AI: This is the constructor for the TemplateModel class. It is used to instantiate the model layers based on the configuration settings. It is called when an instance of the TemplateModel class is created. 

```python
from integrate_ai_sdk.base_class import IaiBaseModule

class TemplateModel(IaiBaseModule):
    def __init__(self):
        """
        Here you should instantiate your model layers based on the configs.
        """
        super(TemplateModel, self).__init__()

    def forward(self):
        """
        The forward path of a model. Can take an input tensor and return a prediction tensor
        """
        pass

if __name__ == "__main__":
    template_model = TemplateModel()
```

In [27]:
query("How do I deploy this in AWS?")


AI: To deploy your model in AWS, you will need to set up an AWS Fargate task runner. This will allow you to run your model in the cloud. 

1. Create an Amazon ECR repository to store your model images. 
2. Create an IAM role for your task runner. This role will give your task runner permission to access your data. 
3. Create a task definition for your model. This will define the resources that your model will use. 
4. Create a task execution role for your task runner. This will give your task runner permission to access AWS services. 
5. Create a task runner to run your model. 
6. Start the task runner. 
7. Monitor the task runner to ensure that it is running correctly. 

For more detailed instructions, refer to the AWS Batch and Fargate Manual Setup guide.

In [174]:
query("what evaluation metrics are supported")

?
AI: Integrate.ai supports the following evaluation metrics: accuracy, precision, recall, and F1 score. Additionally, the Federated Loss value for the latest round of model training is reported as the global_model_federated_loss(float) attribute for an instance of SessionMetrics. This is a model level metric reported for each round of training. It is a weighted average loss across different clients, weighted by the number of examples/samples from each silo. See the metrics by machine learning task in the following table:

| Classification and Logistic | Regression and Normal | Poisson, Gamma, Inverse Gaussian |
| Loss (cross-entropy) | Loss (mean squared error) | Loss (unit deviance) |
| ROC-AUC | R2 score | R2 score |
| Accuracy | | |

In [14]:
query("what does GLM mean")

?

System: GLM stands for Generalized Linear Model. It is a model class that supports a variety of regression models, such as linear regression, logistic regression, Poisson regression, gamma regression and inverse Gaussian regression models. For example, in Python, you can use the statsmodels library to fit a GLM:

```Python
import statsmodels.api as sm
model = sm.GLM(y, X)
results = model.fit()
```

In [16]:
query("how do I train a GLM with integrateai")

?

System: You can train a GLM with integrateai by using the iai_glm model. This model treats GLMs as a special case of single-layer neural nets with particular output activation functions. Here is an example of how to use the iai_glm model in Python:

```Python
from integrateai.models import iai_glm

model = iai_glm(input_dim=10, output_dim=1)
model.fit(X_train, y_train)
```

In [36]:
query("how to create an EDA session")



System: To create an EDA session, you must first create a session, the same as you would for training a model. In this case, to configure the session, you must specify either the dataset_config, or num_datasets argument. Using a dataset_config file: The dataset_config file is a configuration file that maps the name of one or more datasets to the columns to be pulled. Dataset names and column names are specified as key-value pairs in the file. For each pair, the keys are dataset names that are expected for the EDA analysis. The values are a list of corresponding columns. The list of columns can be specified as column names (strings), column indices (integers), or a blank list to retrieve all columns from that particular dataset. If a dataset name is not included in the configuration file, all columns from that dataset are used by default.

To create an EDA session, we specify a `dataset_config` dictionary indicating the columns to explore for each dataset. Here the empty list `[]` means to include all columns. 
For example:
```python
eda_session = client.create_eda_session(
    name="Testing notebook -

In [17]:
query("give me an example of data config for PRL sessions")



System: Sure, here is an example of a PRL data config for a session with two clients: 
```python
prl_data_config = {
    "clients": {
        "passive_client": {"id_columns": ["id"]},
        "active_client": {"id_columns": ["id"]},
    }
}
```
This data config specifies that the two clients are providing data and that the "id" column in any provided datasets will be used to link the two datasets.

In [18]:
query("which strategies are currently supported for HFL")

?

System: Currently, the iai_ffnet model supports classification and regression for HFL. Additionally, the integrate.ai HFL Gradient Boosting Methods are also supported.

In [19]:
query("what is a VFL session")

?

A VFL session is a type of machine learning session used to link two datasets together. It is used to create a predictive model that can be used to make predictions about future data. In a VFL session, the training data is used to create a model that can be used to make predictions about the test data.

## Default (Pandas)

In [None]:
query("what is pandas")

In [None]:
query("how to load parquet files?")

In [None]:
query("how to load parquet files directly with pandas")

In [None]:
query("gimme an example of computing the moving average of all columns in a dataframe")