In [1]:
import os
import requests
import json
from dotenv import load_dotenv


load_dotenv()

True

In [2]:
with open("application/prompts/history_questions.txt", "r") as f:
    history_questions = f.readlines()

with open("application/prompts/history_answers.txt", "r") as f:
    history_answers = f.readlines()

hist = []
for i in range(len(history_questions)):
    hist.append({"prompt": history_questions[i].strip(), "response": history_answers[i]})

In [3]:
hist

[{'prompt': 'How do I get started?',
  'response': '# Install integrate.ai SDK and Client\\n\\n## Generate an access token\\n\\nTo install the client and SDK, you must generate an access token through the web portal. \\n1. Log in to your workspace through the portal.\\n2. On the Dashboard, click Generate Access Token.\\n3. Copy the acccess token and save it to a secure location.\\n\\nImportant: This is the only time that the API token can be viewed or downloaded. If you lose or forget your API token, you cannot retrieve it. Instead, create a new API token and revoke the old one. You can manage API tokens through the web portal. \\n\\nTreat your API tokens like passwords and keep them secret. When working with the API, use the token as an environment variable instead of hardcoding it into your programs. In this documentation, the token is referenced as `<IAI_TOKEN>`.\\n\\n## Download the sample notebook\\n\\n\\n## Install integrate.ai packages\\n\\n1. At a command prompt on your machine

In [59]:
from IPython.display import display, Markdown

def query(question, url="http://0.0.0.0:7091/api/answer", history=hist):
    headers = {
        "Content-Type": "application/json; charset=utf-8"
    }

    payload = {
        "question": question,
        "history": history,
        "api_key": os.environ["OPENAI_API_KEY"],
        "embeddings_key": os.environ["OPENAI_API_KEY"],
    }

    res = requests.post(url=url, data=json.dumps(payload), headers=headers)
    display(Markdown(res.json()["answer"]))
    print("-------------------------------------")
    for doc in res.json()["question_sources"]:
         print(doc)
    print("-------------------------------------")
    for doc in res.json()["answer_sources"]:
        print(doc)

## IAI DOC

In [61]:
query("How do I get started?")


AI: To get started, you will need to install the integrate.ai SDK and Client. 

1. Generate an access token through the web portal. Log in to your workspace through the portal, and on the Dashboard, click Generate Access Token. Copy the acccess token and save it to a secure location. 

2. Download the sample notebook. 

3. Install the integrate.ai packages. At a command prompt on your machine, run the following command to install the management tool for the SDK and client: `pip install integrate-ai`. Install the SDK package using the access token you generated. `iai sdk install --token <IAI_TOKEN>`. Install the integrate.ai client using the same access token. The client is a Docker image that runs in a container. `iai client pull --token <IAI_TOKEN>`. 

*Optional*: If you are building a model for data that includes images or video, follow the steps below for Setting up a Docker GPU Environment.

-------------------------------------
{'score': '0.45616212', 'text': '\n\nDownload the sample notebook\n\n\n', 'title': 'input/install-sdk.md'}
{'score': '0.46205103', 'text': '\n\nRequirements\n\nThis section outlines the setup steps required to configure your working environment. Steps that are performed in the AWS platform are not explained in detail. Refer to the AWS documentation as needed. \n\nThe requirements are tool-agnostic - that is, you can complete the steps through the AWS console, or through a tool such as Terraform or AWS CloudFormation. \n\n', 'title': 'input/aws-batch-manual.md'}
{'score': '0.4785251', 'text': '\n\nWindows Setup\n\n1. Ensure that intel VT-x or AMD SVM is enabled in BIOS, check the motherboard manufacture document for exact steps.\n2. Install CUDA driver or CUDA toolkit:\n* Install cuda toolkit (which include driver, but also contains other unnecessary components)\n** In the component selection screen, you can choose to install only the CUDA driver\n*

In [62]:
query("Do you have sample code?")


AI: Yes, I have sample code for a variety of tasks. For example, here is sample code for creating a scoped token for a user: 

```python
token = auth_client.create_token(user_id=user_name, scopes=[Scope.create_session, Scope.read_user_session])
print(token)p
```

This request returns the unique user ID (the generated email), a list of the granted scopes, and the token, as well as the token ID and the user name. 

Copy and save the token somewhere secure to share with the user.

-------------------------------------
{'score': '0.3772081', 'text': '\n\nDownload the sample notebook\n\n\n', 'title': 'input/install-sdk.md'}
{'score': '0.44656748', 'text': '\n\nintegrate.ai HFL Gradient Boosting Methods Sample Notebook\n\n', 'title': 'input/integrateai_api_gbm.md'}
{'score': '0.46398148', 'text': '\n\nintegrate.ai API Sample Notebook\n\n', 'title': 'input/integrateai_api.md'}
-------------------------------------
{'score': '0.14310782', 'text': '\n\nCreate a scoped user token\n\nCreate a scoped token for the user. Include only the scopes that the user requires to work with the system and their data. \n\n```python\ntoken = auth_client.create_token(user_id=user_name, scopes=[Scope.create_session, Scope.read_user_session])\nprint(token)p\n```\n\nThis request returns the unique user ID (the generated email), a list of the granted scopes, and the token, as well as the token ID and the user name. \n\nCopy and save the token somewhere secure to share with the user. \n\n**

In [63]:
query("How do I use a custom model?")


AI: To use a custom model, you must first create a custom model package. Follow the naming convention for files in the custom package: no spaces, no special characters, no hyphens, all lowercase characters.

1. Create a folder to contain your custom model package. For this tutorial, this folder is named myCustomModel, and is located in the same parent folder as the template folder.

Example path: C:\<workspace>\integrate_ai_sdk\sample_packages\myCustomModel

2. Create two files in the custom model package folder: 
    a. `model.py` - the custom model definition. You can rename the template_model.py as a starting point for this file.
    b. `<model-class-name>.json` - default model inputs for this model. It must have the same name as the model class name that is defined in the model.py file. 

If you are using the template files, the default name is `templatemodel.json`.

3. _Optional_: To use a custom dataloader, you must also create

-------------------------------------
{'score': '0.37047327', 'text': '\n\nCustom model, dataset, and LSTMTagger.json\nChoose a name for your custom model, and set the path for the model and data configurations.  \nNote that the name for your custom model **must be unique**.  \nThis means that the name for your custom model cannot already be in the Package Name column of the Custom Models Packages Table in the Model Library Page of the UI.\n\n\n\n```python\n', 'title': 'input/integrateai_custom_lstm.md'}
{'score': '0.4369074', 'text': '\n\nLoad the saved model\n\nTo load a model saved previously, a model object needs to be initialized first. This can be done by directly importing one of the IAI-supported packages (e.g., FFNet) or using the model class defined in a custom package. \n\n\n```python\nfrom integrate_ai_sdk.sample_packages.lstmTagger.model import LSTMTagger\n\nmodel = LSTMTagger(embedding_dim=4, hidden_dim=3, output_size=4, vocab_size=9)\n\n', 'title': 'input/integrateai_cus

In [64]:
query("how do I generate a non-admin token?")


AI: Create a scoped token for the user. Include only the scopes that the user requires to work with the system and their data. 

```python
token = auth_client.create_token(user_id=user_name, scopes=[Scope.create_session, Scope.read_user_session])
print(token)p
```

This request returns the unique user ID (the generated email), a list of the granted scopes, and the token, as well as the token ID and the user name. 

Copy and save the token somewhere secure to share with the user.

-------------------------------------
{'score': '0.43495193', 'text': '\n\nVerify user and token through the UI\n\nTo confirm that the user and token were created successfully, you can also view them in the web dashboard. \n\n1. Log in to the web dashboard.\n2. Click Token Management.\n2. Click User Scoped Tokens.\n3. Locate the user name for the user you created. \n\n', 'title': 'input/user-auth.md'}
{'score': '0.45202243', 'text': "\n\nCustom Models\n\n\n\n\n \n\nUser Authentication\n\nSharing access to training sessions and shared models in a simple and secure manner is a key requirement for many data custodians. integrate.ai provides a secure method of authenticating end users with limited permissions through the SDK to enable privileged access. \n\nAs the user responsible for managing access through the integrate.ai platform, you have the ability to generate an **unscoped API token** through the integrate.ai UI. Unscoped API tokens provide full access to the integrate.ai SDK. You 

In [65]:
query("How do I renew my token?")


AI: To renew your token, create a new API token and revoke the old one. You can manage API tokens through the web portal. 

1. Log in to your workspace through the portal.
2. On the Dashboard, click Generate Access Token.
3. Copy the acccess token and save it to a secure location.
4. Revoke the old token.
5. Use the new token for all future requests.

-------------------------------------
{'score': '0.3911299', 'text': '\n\nVerify user and token through the UI\n\nTo confirm that the user and token were created successfully, you can also view them in the web dashboard. \n\n1. Log in to the web dashboard.\n2. Click Token Management.\n2. Click User Scoped Tokens.\n3. Locate the user name for the user you created. \n\n', 'title': 'input/user-auth.md'}
{'score': '0.46478128', 'text': "\n\nRevoke a scoped token\n\nUser scoped tokens have a default lifespan of thirty (30) days. To revoke a token before it expires, use the revoke_token command in the SDK. \n\nYou must provide the `token_id` for the token that you want to revoke. You can find this ID in the web dashboard.\n\n```python\nauth_client.revoke_token(token['token_id'])\n```\n\n", 'title': 'input/user-auth.md'}
{'score': '0.47211307', 'text': '\n\nInstall integrate.ai SDK and Client\n\n \n\nGenerate an access token\n\nTo install the client and SDK, you must generate an access token 

In [66]:
query("How do I set my differential privacy parameter?")


AI: Differential privacy parameters can be specified during session creation, within the model configuration. 

The following code example shows how to set the differential privacy parameter for a session: 

```python
session_config = SessionConfig(name="My Session",
                               differential_privacy_parameters=DifferentialPrivacyParameters(epsilon=0.1))
```

The `epsilon` parameter is the privacy budget, which is the maximum amount of privacy loss that is allowed. The higher the value, the less privacy is preserved.

-------------------------------------
{'score': '0.3922711', 'text': '\n\nControl Plane\n\nIn this deployment scenario, the integrate.ai system manages all components hosted in the customer\'s environment using a limited permission role granted by the customer. \n\n \n\nDifferential Privacy\n\nWhat is differential privacy?\nDifferential privacy is a technique for providing a provable measure of how “private” a data set can be. This is achieved by adding a certain amount of noise when responding to a query on the data. A balance needs to be struck between adding too much noise (making the computation less useful), and too little (reducing the privacy of the underlying data).\nThe technique introduces the concept of a privacy-loss parameter (typically represented by ε (epsilon)), which can be thought of as the amount of noise to add for each invocation of some computation on data. A related concept is the privacy budget, which can be chosen by the data curator.\nThis privacy budget repre

In [67]:
query("""What does this error mean:
```2023-05-24 02:19:25,244 FLOUR MainThread ERROR | neural_net.py:215 | <_MultiThreadedRendezvous of RPC that terminated with:
	status = StatusCode.UNKNOWN
	details = "Error in service handler!"
	debug_error_string = "UNKNOWN:Error received from peer ipv4:99.79.32.55:9999 {grpc_message:"Error in service handler!", grpc_status:2, created_time:"2023-05-24T02:19:25.193640106+00:00"}"
>
05/24/2023 02:19:25:ERROR:<_MultiThreadedRendezvous of RPC that terminated with:
	status = StatusCode.UNKNOWN
	details = "Error in service handler!"
	debug_error_string = "UNKNOWN:Error received from peer ipv4:99.79.32.55:9999 {grpc_message:"Error in service handler!", grpc_status:2, created_time:"2023-05-24T02:19:25.193640106+00:00"}"
>
CLI0006: Unable to join server for Session 'dd5a90b700'```
""")


AI: This error indicates that there was an error in the service handler. This could be due to a misconfiguration of the service, or an issue with the network connection. Check the service configuration and network connection to ensure that everything is set up correctly. If the issue persists, contact your system administrator for assistance.

-------------------------------------
{'score': '0.5782596', 'text': "\n\nErrors\n\n\nThis error section is stored in a separate file in includes/_errors.md. Slate allows you to optionally separate out your docs into many files...just save them to the includes folder and add them to the top of your index.md's frontmatter. Files are included in the order listed.\n\n\nThe Kittn API uses the following error codes:\n\n\nError Code | Meaning\n---------- | -------\n400 | Bad Request -- Your request is invalid.\n401 | Unauthorized -- Your API key is wrong.\n403 | Forbidden -- The kitten requested is hidden for administrators only.\n404 | Not Found -- The specified kitten could not be found.\n405 | Method Not Allowed -- You tried to access a kitten with an invalid method.\n406 | Not Acceptable -- You requested a format that isn't json.\n410 | Gone -- The kitten requested has been removed from our servers.\n418 | I'm a teapot.\n429 | Too Many Requests -- You're requesting too many kittens! Slow

In [68]:
query("""What does this error mean:
```An error occurred (InvalidAccessKeyId) when calling the GetObject operation: The AWS Access Key Id you provided does not exist in our records.```
""")

AI: This error indicates that the AWS Access Key ID you provided is invalid or does not exist in the AWS records. Please check that the Access Key ID is correct and that it is associated with the correct AWS account.

-------------------------------------
{'score': '0.4605072', 'text': '\n\nThis parameter is used by the server start-up to extract the access token value from SSM\nssm_token_key = "sample_session_token"\nssm = boto3.client(\'ssm\', region_name=\'\')\nssm.put_parameter(\n    Name=ssm_token_key,\n    Value=IAI_TOKEN,\n    Overwrite=True,\n    Type=\'SecureString\',\n    KeyId=\'\' # Use your own KMS keyID here\n)\n```\n\n', 'title': 'input/aws-fargate-sdk.md'}
{'score': '0.4925499', 'text': '\n\nAuthentication\n\nComplete the steps to create an IAI access token, hereafter referred to as .\n\nOn the AWS CLI, set the token as a parameter for your SSM agent. SSM handles getting and using the token as needed for the batch session.\n\n```python\naws ssm put-parameter --name iai-token --value  --type SecureString\n```\n\nExample response:\n\n```\n{\n    "Version": 1,\n    "Tier": "Standard"\n}\n```\n\n \n\nAbout "secrets"\n\nIn order for the batch job to access the integrate.ai JWT through SSM

In [69]:
query("What's the best way to get the model weights from an HFL logistic regression analysis?")


AI: The best way to get the model weights from an HFL logistic regression analysis is to use the `get_weights()` method. This method returns a dictionary containing the weights for each feature in the model. 

```python
weights = model.get_weights()
```

-------------------------------------
{'score': '0.4343601', 'text': '\n\nintegrate.ai HFL Gradient Boosting Methods Sample Notebook\n\n', 'title': 'input/integrateai_api_gbm.md'}
{'score': '0.46388203', 'text': '\n\nGradient Boosted Models (HFL-GBM)\n\nGradient boosting is a machine learning algorithm for building predictive models that helps minimize the bias error of the model. The gradient boosting model provided by integrate.ai is an HFL model that uses the sklearn implementation of HistGradientBoostingClassifier for classifier tasks and HistGradientBoostingRegresssor for regression tasks.\n\nThe GBM sample notebook (integrateai_api_gbm.ipynb) provides sample code for running the SDK, and should be used in parallel with this tutorial. This documentation provides supplementary and conceptual information to expand on the code demonstration.\n\n', 'title': 'input/hfl-gbm.md'}
{'score': '0.4657126', 'text': '\n\nHFL FFNet\n\nThe iai_ffnet model is a feedforward neural network for hori

In [70]:
query("If I wanted to try multi-client training, where one client has Y, X1 and another client has X2 features, how would I setup the data schema and client train commands?")


AI: To set up multi-client training, you will need to split your data into two silos, one for each client. Each silo should contain the features that the client will use for training. For example, if one client has Y, X1 and the other has X2, the first silo should contain Y and X1, and the second silo should contain X2. 

You can then use the following commands to set up the clients for training: 

```python
train_path1 = "s3://iai-client.sample-data-e2e.integrate.ai/train_silo0.parquet"
train_path2 = "s3://iai-client.sample-data-e2e.integrate.ai/train_silo1.parquet"

client_1 = subprocess.Popen(
    f"iai client train --token {IAI_TOKEN} --session {training_session.id} --train-path {train_path1} --test-path {data_path}/test.parquet --batch-size 1024 --client-

-------------------------------------
{'score': '0.46817732', 'text': '\n\nRun Training Client jobs on AWS Batch\n\n', 'title': 'input/integrateai_batch_client.md'}
{'score': '0.47253546', 'text': '\n\nRemote datasets\n\nOne of the key features of the integrate.ai platform is the ability to work with datasets without having to colocate them. \nThe integrate.ai client and SDK are capable of working with datasets that are hosted remotely on AWS S3. You must set up and configure the AWS CLI to use S3 datasets.\n\n', 'title': 'input/data-requirements.md'}
{'score': '0.48136765', 'text': '\n\nTest data path in s3\ntrain_path1 = "s3://iai-client.sample-data-e2e.integrate.ai/train_silo0.parquet"\ntrain_path2 = "s3://iai-client.sample-data-e2e.integrate.ai/train_silo1.parquet"\ntest_path = "s3://iai-client.sample-data-e2e.integrate.ai/test.parquet"\n', 'title': 'input/integrateai_batch_client.md'}
-------------------------------------
{'score': '0.29123285', 'text': '\n\nStart a training sessi

In [71]:
query("give me an example of running a session on AWS BATCH")


AI: You can use the integrate.ai API Sample Notebook to run a session on AWS Batch. 

1. Create an AWS Batch job queue. 
2. Create an AWS Batch compute environment. 
3. Create an AWS Batch job definition. 
4. Submit the job to the job queue. 
5. Monitor the job status. 

For more information, see the [AWS Batch Documentation](https://docs.aws.amazon.com/batch/latest/userguide/what-is-batch.html).

-------------------------------------
{'score': '0.22747159', 'text': '\n\nRun Training Client jobs on AWS Batch\n\n', 'title': 'input/integrateai_batch_client.md'}
{'score': '0.2741452', 'text': '\n\nRun EDA Client jobs on AWS Batch\n\n', 'title': 'input/integrateai_batch_client.md'}
{'score': '0.31078166', 'text': '\n\nintegrate.ai API Sample Notebook to run client on AWS Batch\n\n', 'title': 'input/integrateai_batch_client.md'}
-------------------------------------
{'score': '0.16450123', 'text': '\n\nintegrate.ai API Sample Notebook to run client on AWS Batch\n\n', 'title': 'input/integrateai_batch_client.md'}
{'score': '0.19896787', 'text': '\n\nintegrate.ai API Sample Notebook to run client on AWS Batch and AWS Fargate\n\n', 'title': 'input/integrateai_batch_client_vfl.md'}
{'score': '0.19896787', 'text': '\n\nintegrate.ai API Sample Notebook to run client on AWS Batch and AWS Fargate\n\n', 'title': 'input/integrateai_fargate_batch_client_vfl.md'}


In [72]:
query("can you give the template for building a custom model")

?
AI: This is the constructor for the TemplateModel class. It is used to instantiate the model layers based on the configuration settings. It is called when an instance of the TemplateModel class is created. 

```python
from integrate_ai_sdk.base_class import IaiBaseModule

class TemplateModel(IaiBaseModule):
    def __init__(self):
        """
        Here you should instantiate your model layers based on the configs.
        """
        super(TemplateModel, self).__init__()

    def forward(self):
        """
        The forward path of a model. Can take an input tensor and return a prediction tensor
        """
        pass

if __name__ == "__main__":
    template_model = TemplateModel()
```

-------------------------------------
{'score': '0.43804806', 'text': '\n\nCustom model, dataset, and LSTMTagger.json\nChoose a name for your custom model, and set the path for the model and data configurations.  \nNote that the name for your custom model **must be unique**.  \nThis means that the name for your custom model cannot already be in the Package Name column of the Custom Models Packages Table in the Model Library Page of the UI.\n\n\n\n```python\n', 'title': 'input/integrateai_custom_lstm.md'}
{'score': '0.47086468', 'text': '\n\nDownload the sample notebook\n\n\n', 'title': 'input/install-sdk.md'}
{'score': '0.48273015', 'text': '\n\nintegrate.ai HFL Gradient Boosting Methods Sample Notebook\n\n', 'title': 'input/integrateai_api_gbm.md'}
-------------------------------------
{'score': '0.2297841', 'text': '\n\nFunction name: __init__\n\nFunction: \n```\ndef __init__(self):\n        """\n        Here you should instantiate your model layers based on the configs.\n        """

In [73]:
query("How do I deploy this in AWS?")


AI: To deploy your model in AWS, you will need to set up an AWS Fargate task runner. This will allow you to run your model in the cloud. 

1. Create an Amazon ECR repository to store your model images. 
2. Create an IAM role for your task runner. This role will give your task runner permission to access your data. 
3. Create a task definition for your model. This will define the resources that your model will use. 
4. Create a task execution role for your task runner. This will give your task runner permission to access AWS services. 
5. Create a task runner to run your model. 
6. Start the task runner. 
7. Monitor the task runner to ensure that it is running correctly. 

For more detailed instructions, refer to the AWS Batch and Fargate Manual Setup guide.

-------------------------------------
{'score': '0.3719529', 'text': '\n\nRequirements\n\nThis section outlines the setup steps required to configure your working environment. Steps that are performed in the AWS platform are not explained in detail. Refer to the AWS documentation as needed. \n\nThe requirements are tool-agnostic - that is, you can complete the steps through the AWS console, or through a tool such as Terraform or AWS CloudFormation. \n\n', 'title': 'input/aws-batch-manual.md'}
{'score': '0.3770063', 'text': '\n\nRun Training Server on AWS Fargate\n\n', 'title': 'input/integrateai_fargate_server.md'}
{'score': '0.37755996', 'text': '\n\nAWS Batch and Fargate Manual Setup\n\n', 'title': 'input/aws-batch-manual.md'}
-------------------------------------
{'score': '0.24117789', 'text': '\n\nManaged Cloud-hosted\n\nFocused on model building, they don’t have IT team. They need job to run when they execute. Set up task runners - make sure AWS is up and running, there’s an ecr 

In [74]:
query("what evaluation metrics are supported")

?
AI: Integrate.ai supports the following evaluation metrics: accuracy, precision, recall, and F1 score. Additionally, the Federated Loss value for the latest round of model training is reported as the global_model_federated_loss(float) attribute for an instance of SessionMetrics. This is a model level metric reported for each round of training. It is a weighted average loss across different clients, weighted by the number of examples/samples from each silo. See the metrics by machine learning task in the following table:

| Classification and Logistic | Regression and Normal | Poisson, Gamma, Inverse Gaussian |
| Loss (cross-entropy) | Loss (mean squared error) | Loss (unit deviance) |
| ROC-AUC | R2 score | R2 score |
| Accuracy | | |

-------------------------------------
{'score': '0.50972795', 'text': '\n\nintegrate.ai HFL Gradient Boosting Methods Sample Notebook\n\n', 'title': 'input/integrateai_api_gbm.md'}
{'score': '0.51639086', 'text': '\n\nDeployment Scenarios\n\n\n', 'title': 'input/deployment.md'}
{'score': '0.5164914', 'text': '\n\nEvaluation Metrics\n\nWhen the session is complete, you can see a set of metrics for all rounds of training, as well as metrics for the final model.\nRetrieve Metrics for a Session\nUse the SessionMetrics class of the API to store and retrieve metrics for a session. You can retrieve the model performance metrics as a dictionary (Dict), or plot them. See the API class reference for details. \nTypical usage example:\nclient = connect("token") \n\nalready_trained_session_id = ""\n\nsession = client.fl_session(already_trained_session_id)\n\n \n\nretrieve the metrics for the session as a dictionary\nmetrics = session.metrics.as_dict()\nAuthenticate to and connect to the integrate.a

In [75]:
query("what does GLM mean")

?
AI: GLM stands for Generalized Linear Model. It is a model class that supports a variety of regression models, such as linear regression, logistic regression, Poisson regression, gamma regression and inverse Gaussian regression models.

-------------------------------------
{'score': '0.31024972', 'text': '\n\nGLM\n\n#GBM\n\n#LSTM\n\n', 'title': 'input/train-overview.md'}
{'score': '0.52677596', 'text': '\n\nVFL SplitNN\n\nintegrate.ai also supports the SplitNN model for vertical federated learning (VFL). In this model, neural networks are trained with data across multiple clients. A PRL (private-record linking) session is required for all datasets involved. There are two types of sessions: train, and predict. To make predictions, the PRL session ID and the corresponding training session ID are required. \n\nFor more information, see  and .\n\n \n\nGeneralized Linear Models (GLMs)\n\nThis model class supports a variety of regression models. Examples include linear regression, logistic regression, Poisson regression, gamma regression and inverse Gaussian regression models. We also support regularizing the model coefficients with the elastic net penalty.\n\nExamples of use cases include [1]:\n\n* Agriculture / weather m

In [76]:
query("how do I train a GLM with integrate.ai")

?
AI: You can use the iai_glm model to train generalized linear models. The iai_glm model is a special case of single-layer neural nets with particular output activation functions. 

To train a GLM with integrate.ai, you need to create a model package that contains the model definition and the default model inputs. 

1. Create a folder to contain your custom model package. For this tutorial, this folder is named myCustomModel, and is located in the same parent folder as the template folder.

Example path: C:\<workspace>\integrate_ai_sdk\sample_packages\myCustomModel

2. Create two files in the custom model package folder: 
    a. `model.py` - the custom model definition. You can rename the template_model.py as a starting point for this file.
    b. `<model-class-name>.json` - default model inputs for this model. It must have the same name as the model class name that is defined in the model.py file. 

If you are using the template files, the default name is

-------------------------------------
{'score': '0.36423275', 'text': '\n\nintegrate.ai API Sample Notebook\n\n', 'title': 'input/integrateai_api.md'}
{'score': '0.36423275', 'text': '\n\nintegrate.ai API Sample Notebook\n\n', 'title': 'input/integrateai_eda_intersect_batch.md'}
{'score': '0.37011167', 'text': '\n\nVFL SplitNN\n\nintegrate.ai also supports the SplitNN model for vertical federated learning (VFL). In this model, neural networks are trained with data across multiple clients. A PRL (private-record linking) session is required for all datasets involved. There are two types of sessions: train, and predict. To make predictions, the PRL session ID and the corresponding training session ID are required. \n\nFor more information, see  and .\n\n \n\nGeneralized Linear Models (GLMs)\n\nThis model class supports a variety of regression models. Examples include linear regression, logistic regression, Poisson regression, gamma regression and inverse Gaussian regression models. We als

In [77]:
query("how to create an EDA session")

?
AI: To begin exploratory data analysis, you must first create a session, the same as you would for training a model. In this case, to configure the session, you must specify either the dataset_config, or num_datasets argument. Using a dataset_config file: The dataset_config file is a configuration file that maps the name of one or more datasets to the columns to be pulled. Dataset names and column names are specified as key-value pairs in the file. For each pair, the keys are dataset names that are expected for the EDA analysis. The values are a list of corresponding columns. The list of columns can be specified as column names (strings), column indices (integers), or a blank list to retrieve all columns from that particular dataset. If a dataset name is not included in the configuration file, all columns from that dataset are used by default.

To create an EDA session, we specify a `dataset_config` dictionary indicating the columns to explore for each dataset. Here the empty list `[]` means to include all columns. 
For information more information on how to configure an EDA session, see the documentation here.

```python
eda_session

-------------------------------------
{'score': '0.33949137', 'text': '\n\nStart an EDA Session using IAI client\nFollow the documentation on directions for how to install the integrate_ai package and the sample data.\nUnzip the sample data to your `~/Downloads` directory, otherwise update the `data_path` below to point to the sample data.\n\n\n```python\n', 'title': 'input/integrateai_eda_intersect_batch.md'}
{'score': '0.3567767', 'text': '\n\ndata_dir = \'~/Downloads/synthetic\'\nstorage_path = "azure://test-ron-blob"\ntrain_path1 = f"{data_dir}/train_silo0.parquet"\ntrain_path2 = f"{data_dir}/train_silo1.parquet"\ntest_path = f"{data_dir}/test.parquet"\n```\n\n \n\nCreate and Run EDA Session\n\n\n```python\ndataset_config = {"dataset_one": [], "dataset_two": []}\n\neda_session = client.create_eda_session(\n    name="Testing notebook - EDA",\n    description="I am testing EDA session creation through a notebook",\n    data_config=dataset_config,\n    startup_mode="external",\n).star

In [78]:
query("give me an example of data config for PRL sessions")


AI: For this session, two clients are going to be providing data. Client 1 and client 2 are naming their clients client_1 and client_2 respectively. Their datasets will be linked by the "id" column in any provided datasets.

```python
prl_data_config = {
    "clients": {
        "passive_client": {"id_columns": ["id"]},
        "active_client": {"id_columns": ["id"]},
    }
}
```

-------------------------------------
{'score': '0.36473554', 'text': '\n\nSample PRL Data Config\n\nFor this session, two clients are going to be providing data. Client 1 and client 2 are naming their clients client_1 and client_2 respectively. Their datasets will be linked by the "id" column in any provided datasets.\n\n\n```python\nprl_data_config = {\n    "clients": {\n        "passive_client": {"id_columns": ["id"]},\n        "active_client": {"id_columns": ["id"]},\n    }\n}\n```\n\n', 'title': 'input/integrateai_fargate_batch_client_vfl.md'}
{'score': '0.37615478', 'text': '\n\nPRL Data Config\n\nFor this session, two clients are going to be providing data. Client 1 and client 2 are naming their clients client_1 and client_2 respectively. Their datasets will be linked by the "id" column in any provided datasets.\n\n\n```python\nprl_data_config = {\n    "clients": {\n        "active_client": {"id_columns": ["id"]},\n        "passive_client": {"id_columns": ["id"]},\n    }\n}\n```

In [79]:
query("which strategies are currently supported for HFL")

?
AI: Currently, the integrate.ai HFL model supports the following strategies: 

- Federated Averaging (FedAvg) 
- Federated Stochastic Gradient Descent (FedSGD) 
- Federated Proximal Stochastic Gradient Descent (FedProxSGD) 
- Federated Momentum (FedMomentum) 
- Federated Adagrad (FedAdagrad) 
- Federated Adam (FedAdam) 
- Federated Proximal Adam (FedProxAdam) 
- Federated Averaged Stochastic Gradient Descent (FedAvgSGD) 
- Federated Averaged Proximal Stochastic Gradient Descent (FedAvgProxSGD) 
- Federated Averaged Momentum (FedAvgMomentum) 
- Federated Averaged Adagrad (FedAvgAdagrad) 
- Federated Averaged Adam (FedAvgAdam) 
- Federated Averaged Proximal Adam (FedAvgProxAdam) 
- Federated Stochastic Gradient Descent with Momentum (FedSG

-------------------------------------
{'score': '0.38951057', 'text': '\n\nFFNet\n\n ## HFL\n\n ## VFL \n\n\n\n', 'title': 'input/train-overview.md'}
{'score': '0.42059958', 'text': '\n\nHFL FFNet\n\nThe iai_ffnet model is a feedforward neural network for horizontal federated learning (HFL) that uses the same activation for each hidden layer.\n\nThis model only supports classification and regression. Custom loss functions are not supported. \n\n', 'title': 'input/iai_ffnet.md'}
{'score': '0.4634987', 'text': '\n\nintegrate.ai HFL Gradient Boosting Methods Sample Notebook\n\n', 'title': 'input/integrateai_api_gbm.md'}
-------------------------------------
{'score': '0.32290506', 'text': '\n\nHFL FFNet\n\nThe iai_ffnet model is a feedforward neural network for horizontal federated learning (HFL) that uses the same activation for each hidden layer.\n\nThis model only supports classification and regression. Custom loss functions are not supported. \n\n', 'title': 'input/iai_ffnet.md'}
{'sc

In [80]:
query("what is a VFL session")

?
AI: A VFL session is a session in the integrate.ai platform that allows you to link two datasets together. This is done by creating a session that contains the two datasets, and then using the session ID to link the datasets together. The session can then be used to train a model or to make predictions.

-------------------------------------
{'score': '0.43872964', 'text': '\n\nintegrate.ai VFL Flow\n\nThe following diagram outlines the training flow in the integrate.ai implementation of VFL.\n\n\n', 'title': 'input/vfl-train.md'}
{'score': '0.4483566', 'text': '\n\nFFNet\n\n ## HFL\n\n ## VFL \n\n\n\n', 'title': 'input/train-overview.md'}
{'score': '0.5002715', 'text': '\n\nSession Complete!\nNow you can view the vfl training metrics and start making predictions\n\n\n```python\nvfl_train_session.metrics().as_dict()\n```\n\n\n```python\nfig = vfl_train_session.metrics().plot()\n```\n\n \n\nMake a Prediction on the trained VFL Model\nTo create a VFL predict session, specify the `prl_session_id` indicating the session above used to link the datasets together. You also need the `training_id` of the above VFL train session.The `vfl_mode` needs to be set to `\'predict\'`.\n\n\n```python\nvfl_predict_session = client.create_vfl_session(\n    name="Testing notebook - VFL Predict",\n    descri

## Default (Pandas)

In [81]:
query("what is pandas")

?
AI: Pandas is an open source library for data analysis and manipulation in Python. It provides data structures and operations for manipulating numerical tables and time series. It is built on top of the NumPy library and is designed for working with tabular data. Pandas provides powerful data analysis tools such as data filtering, aggregation, and visualization. It also provides support for missing data, time series analysis, and statistical modeling.

-------------------------------------
{'score': '0.60430247', 'text': '\n\nClass name: TestDataset\n\nFunctions: \ntest_pandas_dataset_encode, test_hash_dataframe, \nDocumentation: \n\n\nClass TestDataset\n\nThis class provides methods to test a pandas dataset for encoding and hashing.\n\nMethods:\n\ntest_pandas_dataset_encode(dataset):\n    This method tests a pandas dataset for encoding. It checks if the dataset is encoded correctly and if the data types are correct.\n\ntest_hash_dataframe(dataset):\n    This method tests a pandas dataset for hashing. It checks if the data is hashed correctly and if the data types are correct.\n', 'title': 'input/tests/test_dataset.md'}
{'score': '0.6137383', 'text': '\n\nLoad test data\n\n\n```python\nimport pandas as pd\n\ntest_data = pd.read_parquet("./test.parquet")\ntest_data.head()\n```\n\n', 'title': 'input/integrateai_batch_client.md'}
{'score': '0.6157502', 'text': "\n\npoll for status\nvfl_predict_task_group_context.wait(300)\nView VFL Pred

In [82]:
query("how to load parquet files?")


AI: You can use the `pandas` library to read parquet files. 

```python
import pandas as pd

test_data = pd.read_parquet("./test.parquet")
```

If you need to specify a path to the parquet file, you can use the `f-string` format to do so. 

```python
import pandas as pd

test_data = pd.read_parquet(f"{data_path}/test.parquet")
```

If you have already defined the path to the parquet file, you can use the variable name instead. 

```python
import pandas as pd

test_data = pd.read_parquet(test_path)
```

-------------------------------------
{'score': '0.3535117', 'text': '\n\nLoad test data\n\n\n```python\nimport pandas as pd\n\ntest_data = pd.read_parquet("./test.parquet")\ntest_data.head()\n```\n\n', 'title': 'input/integrateai_batch_client.md'}
{'score': '0.37380177', 'text': '\n\nLoad test data\n\n\n```python\nimport pandas as pd\n\ntest_data = pd.read_parquet(f"{data_path}/test.parquet")\ntest_data.head()\n```\n\n', 'title': 'input/integrateai_api.md'}
{'score': '0.37380177', 'text': '\n\nLoad test data\n\n\n```python\nimport pandas as pd\n\ntest_data = pd.read_parquet(f"{data_path}/test.parquet")\ntest_data.head()\n```\n\n', 'title': 'input/integrateai_api_gbm.md'}
-------------------------------------
{'score': '0.20677531', 'text': '\n\nLoad the test data\n\n```python\nimport pandas as pd\n\ntest_data = pd.read_parquet(f"{data_path}/test.parquet")\ntest_data.head()\n```\n\nExample (image)\n\n', 'title': 'input/hfl-gbm.md'}
{'score': '0.21050414', 'text': '\n\nLoad test data\n\

In [83]:
query("how to load parquet files directly with pandas")

?
AI: You can use the `pd.read_parquet()` function to read parquet files directly with pandas. 

```python
import pandas as pd

test_data = pd.read_parquet("./test.parquet")
```

You can also use the `f-string` format to specify the path to the parquet file. 

```python
import pandas as pd

test_data = pd.read_parquet(f"{data_path}/test.parquet")
```

If you have the path to the parquet file stored in a variable, you can use that as well. 

```python
import pandas as pd

test_data = pd.read_parquet(test_path)
```

-------------------------------------
{'score': '0.3372351', 'text': '\n\nLoad test data\n\n\n```python\nimport pandas as pd\n\ntest_data = pd.read_parquet("./test.parquet")\ntest_data.head()\n```\n\n', 'title': 'input/integrateai_batch_client.md'}
{'score': '0.36065042', 'text': '\n\nLoad test data\n\n\n```python\nimport pandas as pd\n\ntest_data = pd.read_parquet(f"{data_path}/test.parquet")\ntest_data.head()\n```\n\n', 'title': 'input/integrateai_api.md'}
{'score': '0.36065042', 'text': '\n\nLoad test data\n\n\n```python\nimport pandas as pd\n\ntest_data = pd.read_parquet(f"{data_path}/test.parquet")\ntest_data.head()\n```\n\n', 'title': 'input/integrateai_api_gbm.md'}
-------------------------------------
{'score': '0.22612014', 'text': '\n\nLoad the test data\n\n```python\nimport pandas as pd\n\ntest_data = pd.read_parquet(f"{data_path}/test.parquet")\ntest_data.head()\n```\n\nExample (image)\n\n', 'title': 'input/hfl-gbm.md'}
{'score': '0.2266917', 'text': '\n\nLoad test data\n\n

In [84]:
query("gimme an example of computing the moving average of all columns in a dataframe")


AI: We can compute the moving average of all columns in a dataframe using the pandas.DataFrame.rolling() method. For example, the following code computes the moving average of all columns in a dataframe with a window size of 3:

```python
import pandas as pd

# Create a dataframe
df = pd.DataFrame({'A': [1, 2, 3, 4, 5],
                   'B': [2, 3, 4, 5, 6],
                   'C': [3, 4, 5, 6, 7]})

# Compute the moving average
df.rolling(window=3).mean()
```

-------------------------------------
{'score': '0.5283706', 'text': '\n\nintegrate.ai API LSTM Sample Notebook\n\n', 'title': 'input/integrateai_custom_lstm.md'}
{'score': '0.531085', 'text': '\n\nintegrate.ai HFL Gradient Boosting Methods Sample Notebook\n\n', 'title': 'input/integrateai_api_gbm.md'}
{'score': '0.53433895', 'text': '\n\nGLM\n\n#GBM\n\n#LSTM\n\n', 'title': 'input/train-overview.md'}
-------------------------------------
{'score': '0.44940227', 'text': '\n\nFunction name: mean\n\nFunction: \n```\ndef mean(self) -> float:\n        """Calculates the mean of the column data.\n        Will return NaN if column is not continuous.\n        Returns:\n            Mean of column data.\n        """\n        if not self.is_continuous:\n            return np.nan\n        return (self.counts * self._bin_avg).sum() / self._count\n```, \nDocumentation: \n\nThis function calculates the mean of the column data. It will return NaN if the column is not continuous. It takes in the column 

In [85]:
query("what PETs is Integrate.ai using?")


AI: Integrate.ai uses the following Privacy Enhancing Technologies (PETs) to protect user data: 

- Data Masking: Data masking is used to protect sensitive data by replacing it with a non-sensitive version. This is done by replacing the original data with a randomly generated value that is consistent across all users. 

- Data Anonymization: Data anonymization is used to protect user data by removing personally identifiable information (PII) from the data. This is done by removing or replacing any data that could be used to identify an individual. 

- Data Encryption: Data encryption is used to protect user data by encrypting it so that it can only be accessed by authorized users. This is done by using a secure encryption algorithm to scramble the data so that it is unreadable to anyone without the encryption key. 

- Access Control: Access control is used to protect user data by restricting access to only authorized users. This is done by setting up user roles and permissions that determine who can access what data.

-------------------------------------
{'score': '0.34474123', 'text': '\n\nintegrate.ai API Sample Notebook\n\n', 'title': 'input/integrateai_api.md'}
{'score': '0.34474123', 'text': '\n\nintegrate.ai API Sample Notebook\n\n', 'title': 'input/integrateai_eda_intersect_batch.md'}
{'score': '0.37805355', 'text': '\n\nPlatform Overview\n\nThe integrate.ai SaaS platform consists of 3 main components:\n1. A federated learning server and backend.\n2. A web app for workspace administration.\n3. A robust API and SDK that support federated analytical orchestration.\n\n', 'title': 'input/overview.md'}
-------------------------------------
{'score': '0.37125948', 'text': "\n\nCustom Models\n\n\n\n\n \n\nUser Authentication\n\nSharing access to training sessions and shared models in a simple and secure manner is a key requirement for many data custodians. integrate.ai provides a secure method of authenticating end users with limited permissions through the SDK to enable privileged access. \n\nAs t

In [60]:
query("why Integrate.ai is the best company in the world?")


AI: I'm sorry, I'm not able to answer that question. However, I can tell you that Integrate.ai is a leading AI platform that helps companies build and deploy AI solutions quickly and easily. It provides a suite of tools and services to help companies create, deploy, and manage AI models. It also provides access to a wide range of data sources and services to help companies build and deploy AI models.

-------------------------------------
{'score': '0.3499709', 'text': '\n\nPlatform Overview\n\nThe integrate.ai SaaS platform consists of 3 main components:\n1. A federated learning server and backend.\n2. A web app for workspace administration.\n3. A robust API and SDK that support federated analytical orchestration.\n\n', 'title': 'input/overview.md'}
{'score': '0.3699282', 'text': '\n\nintegrate.ai API Sample Notebook\n\n', 'title': 'input/integrateai_api.md'}
{'score': '0.3699282', 'text': '\n\nintegrate.ai API Sample Notebook\n\n', 'title': 'input/integrateai_eda_intersect_batch.md'}
-------------------------------------
{'score': '0.28809512', 'text': '\n\nPlatform Overview\n\nThe integrate.ai SaaS platform consists of 3 main components:\n1. A federated learning server and backend.\n2. A web app for workspace administration.\n3. A robust API and SDK that support federated analytical orchestration.\n\n', 'title': 'input/overview.md'}
{'score': '0.3516042', 'text': '\n\nRemote data