
jina-embeddings-v3 is a multilingual multi-task text embedding model designed for a variety of NLP applications. Based on the Jina-XLM-RoBERTa architecture, this model supports Rotary Position Embeddings to handle long input sequences up to 8192 tokens. Additionally, it features 5 LoRA adapters to generate task-specific embeddings efficiently.

# New Tests fixed

In [1]:
import sys
import os

# Get the project's root directory
project_root = os.path.abspath(os.path.join(os.getcwd(), ".."))

# Add the project root to the Python path if it's not already there
if project_root not in sys.path:
    sys.path.append(project_root)

# Now you can import directly
from app.utils.embedding import EmbeddingGenerator

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
import google.generativeai as genai
from langchain_postgres.vectorstores import PGVector
from app.utils.embedding import EmbeddingGenerator

In [3]:
api_key="AIzaSyAker6F4E8-U6drPx76tC8gFHv1dU9I2Ww"
client = genai.configure(api_key=api_key)

In [4]:
connection_string = "postgresql+psycopg://postgres:postgres@localhost:5432/app"
collection_name = "Data Source Connection"
query = "how do i connect my aws account for billing information where i havent configured the data export"
# query = "How to connect gcp data source?"

In [5]:
# Cell 4: Instantiate the Embeddings Class
embedding_function = EmbeddingGenerator()

In [8]:
# Cell 5: Define and run the similarity search function
def perform_similarity_search(query: str, top_k: int = 4):
    vector_store = PGVector(
        embeddings=embedding_function,
        connection=connection_string,
        collection_name=collection_name,
        use_jsonb=True,
    )
    
    results = vector_store.similarity_search_with_relevance_scores(
        query,
        k=top_k,
        score_threshold=0.5,
        filter={"source": "AWS"}
    )
    return results


In [None]:
# Cell 6: Execute the search and print results
# try:
#     results = perform_similarity_search(query, top_k=4)
#     print("\nSimilarity Search Results:")
#     for document, score in results:
#         print("Document:", document)
#         print("Relevance Score:", score, "\n")
# except Exception as e:
#     print(f"An error occurred during similarity search: {e}")


Similarity Search Results:
Relevance Score: 0.7527442744638989 

Document: page_content='Insert the JSON code on the Type or paste a JSON policy document step: { "Version": "2012-10-17", "Statement": [ { "Sid": "CloudtunerOperations", "Effect": "Allow", "Action": [ "s3:GetBucketPublicAccessBlock", "s3:GetBucketPolicyStatus", "s3:GetBucketTagging", "iam:GetAccessKeyLastUsed", "cloudwatch:GetMetricStatistics", "s3:GetBucketAcl", "ec2:Describe*", "s3:ListAllMyBuckets", "iam:ListUsers", "s3:GetBucketLocation", "iam:GetLoginProfile", "cur:DescribeReportDefinitions", "iam:ListAccessKeys" ], "Resource": "*" } ] } 4. Create user and grant policies: Go to Identity and Access Management (IAM) → Users → create a new user. In Step 2. Set permissions, select Attach policies directly → attach the policies created earlier. Confirm the creation of the user. 5. Create access key: Go to Identity and Access Management (IAM) → Users → select the created user → create an access key Download the .csv file 

# Gemini Add 

In [9]:
results = perform_similarity_search(query, top_k=4)
merged_content = "\n\n".join([doc.page_content for doc, _ in results])
print(merged_content)


Insert the JSON code on the Type or paste a JSON policy document step: { "Version": "2012-10-17", "Statement": [ { "Sid": "CloudtunerOperations", "Effect": "Allow", "Action": [ "s3:GetBucketPublicAccessBlock", "s3:GetBucketPolicyStatus", "s3:GetBucketTagging", "iam:GetAccessKeyLastUsed", "cloudwatch:GetMetricStatistics", "s3:GetBucketAcl", "ec2:Describe*", "s3:ListAllMyBuckets", "iam:ListUsers", "s3:GetBucketLocation", "iam:GetLoginProfile", "cur:DescribeReportDefinitions", "iam:ListAccessKeys" ], "Resource": "*" } ] } 4. Create user and grant policies: Go to Identity and Access Management (IAM) → Users → create a new user. In Step 2. Set permissions, select Attach policies directly → attach the policies created earlier. Confirm the creation of the user. 5. Create access key: Go to Identity and Access Management (IAM) → Users → select the created user → create an access key Download the .csv file with Access key and Secret access key. Connect to Cloudtuner# Once the user is configured

In [10]:

prompt_data = {
    "content": merged_content,
    "question": query
}
print(prompt_data)



In [11]:
import json
combined_prompt = json.dumps(prompt_data, indent=2)
print(combined_prompt)

{
  "question": "how do i connect my aws account for billing information where i havent configured the data export"
}


In [10]:
from google import genai
from google.genai import types

In [11]:
client = genai.Client(api_key=api_key)

In [12]:
# Read the content of prompt.txt
with open("prompt.txt", "r") as file:
    prompt_content = file.read()

print(prompt_content)

You are a FinOps assistant for a SaaS platform that helps users optimize cloud infrastructure costs and improve security. Based on the following input:

{
  "content": "<Relevant knowledge provided from RAG similarity search>",
  "query": "<User's question>"
}

Follow these instructions strictly:

1. **Analyze** the entire content context deeply. Extract all necessary and actionable information to answer the query accurately.
2. **Preserve and render** any technical artifacts such as JSON policies, configurations, or code blocks with correct formatting. Never alter syntactic integrity.
3. **Maintain clarity and structure** by segmenting the output into distinct, logically ordered sections, making the response digestible for both technical and business users.
4. **Do not hallucinate**. Only use information present in the `content`. If any required data is missing or ambiguous, explicitly mention it.
5. **Avoid redundancy**. Do not repeat details across sections unless necessary for clar

In [13]:
system_instruction = prompt_content

In [14]:
response = client.models.generate_content(
    model="gemini-2.0-flash",
    config=types.GenerateContentConfig(
        system_instruction=system_instruction,),
    contents=combined_prompt
)

In [15]:
print(response.text)

Okay, I will provide a detailed guide on connecting your AWS account to Cloudtuner for billing information when you haven't configured data export yet.

**🔹 Summary**

To connect your AWS account to Cloudtuner for billing, you'll need to configure a Data Export in AWS, create specific IAM policies and a user with those policies, and then connect this user to Cloudtuner, providing the necessary credentials and export details. Since you haven't configured a data export yet, you'll need to start there.

**📘 Detailed Insights**

The process involves setting up a data export in AWS, configuring the necessary IAM policies and user, and then connecting to Cloudtuner.  Here's a breakdown:

1.  **Root Account – Data Export Not Configured Yet:**
    *   Creating a data export is available only for the Root cloud account (payer). Linked accounts will receive billing data through the main account's invoice.
    *   Navigate to AWS Billing & Cost Management \u2192 Data Exports.
2.  **Create a New D

In [16]:

stream_response = client.models.generate_content_stream(
    model="gemini-2.0-flash",
    config=types.GenerateContentConfig(
        system_instruction=system_instruction,),
    contents=combined_prompt,
)
for chunk in stream_response:
    print(chunk.text, end="")

### 🔹 Summary
To connect your AWS account for billing information when you haven't configured data export, you need to first configure a data export in AWS. Then, create the necessary IAM user and policies, and finally connect to Cloudtuner using the created credentials and data export details.

### 📘 Detailed Insights

1.  **Root Account - Data Export Not Configured Yet:**
    *   Creating a data export is available only for the Root cloud account (payer). Linked accounts will receive billing data through the main account's invoice.
    *   To use automatic/manual billing data import in Cloudtuner, create a Data Export in AWS.
    *   Navigate to AWS Billing & Cost Management \u2192 Data Exports.
2.  **Create a New Data Export:** You can choose between **Standard** or **Legacy CUR Export.**
    *   **Standard Data Export:**
        1.  Select "Standard data export" as the export type.
        2.  Enter the export name.
        3.  Select "CUR 2.0", check the "Include resource IDs" che

### 🔹 Summary
To connect your AWS account for billing information when you haven't configured data export, you need to first configure a data export in AWS. Then, create the necessary IAM user and policies, and finally connect to Cloudtuner using the created credentials and data export details.

### 📘 Detailed Insights

1.  **Root Account - Data Export Not Configured Yet:**
    *   Creating a data export is available only for the Root cloud account (payer). Linked accounts will receive billing data through the main account's invoice.
    *   To use automatic/manual billing data import in Cloudtuner, create a Data Export in AWS.
    *   Navigate to AWS Billing & Cost Management \u2192 Data Exports.
2.  **Create a New Data Export:** You can choose between **Standard** or **Legacy CUR Export.**
    *   **Standard Data Export:**
        1.  Select "Standard data export" as the export type.
        2.  Enter the export name.
        3.  Select "CUR 2.0", check the "Include resource IDs" checkbox, and choose the time granularity for aggregation.
        4.  Select "Overwrite existing data export file" and choose the compression type.
        5.  Set the data export storage setting: Create a new or use an existing bucket, and enter the S3 path prefix.
        6.  Confirm export creation.
    *   **Legacy CUR Export:**
        1.  Select "Legacy CUR export (CUR)" as the export type.
        2.  Enter the export name.
        3.  Select the "Include resource IDs" and "Refresh automatically" checkboxes.
        4.  Set the data export delivery options: choose the time granularity, select "Overwrite existing report", and choose the compression type.
        5.  Set the data export storage setting: Create a new bucket or use an existing one and enter the S3 path prefix.
        6.  Confirm export creation.
3.  **Configure Policies and User:**
    *   **Update Bucket Policy:**  Add a new statement to the bucket policy to allow AWS Data Exports to write to S3.
    *   **Create User Policies:**
        *   Create user policies for Discover Resources and ReadOnly access. You will need to create a user and grant policies.
        *   Go to Identity and Access Management (IAM) \u2192 Users \u2192 create a new user.
        *   In Step 2. Set permissions, select Attach policies directly \u2192 attach the policies created earlier.
        *   Confirm the creation of the user.
        *   Create an access key: Go to Identity and Access Management (IAM) \u2192 Users \u2192 select the created user \u2192 create an access key.
        *   Download the .csv file with Access key and Secret access key.
4.  **Connect to Cloudtuner:**
    *   Go to Cloudtuner \u2192 Data Sources \u2192 click the Add button \u2192 select AWS Root.
    *   Fill in the fields:
        *   Provide user credentials: AWS Access key ID \u2192 Access key, AWS Secret access key \u2192 Access key secret.
        *   Select Export type as in the report configured earlier: AWS Billing and Cost Management \u2192 Data Exports \u2192 find the report configured earlier \u2192 export type.
        *   Switch off Automatically detect existing Data Exports.
        *   Select Connect only to data in bucket.
        *   Provide Data Export parameters:
            *   Export Name: AWS Billing and Cost Management \u2192 Data Exports table \u2192 Export name column.
            *   Export Amazon S3 Bucket Name: AWS Billing and Cost Management \u2192 Data Exports table \u2192 S3 bucket column.
            *   Export path prefix: AWS Billing and Cost Management \u2192 Data Exports table \u2192 click on Export name \u2192 Edit \u2192 Data export storage settings \u2192 S3 destination \u2192 last folder name(without “/”). Example, S3 destination: s3://aqa-bill-bucket/report-cur2, enter report-cur2 into the field.
    *   Click Connect when done.

### ✅ Actionable Steps / Recommendations

1.  **Configure Data Export in AWS:** Follow the steps in the section "Root account – Data Export not configured yet" to create either a Standard or Legacy CUR export.
2.  **Update Bucket Policy:** Navigate to the Permissions tab of your AWS S3 bucket and add a new statement with the below JSON, replacing `<bucket_name>` with your bucket name and `<AWS account ID>` with your AWS account ID:

    ```json
    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Sid": "EnableAWSDataExportsToWriteToS3AndCheckPolicy",
          "Effect": "Allow",
          "Principal": {
            "Service": [
              "billingreports.amazonaws.com",
              "bcm-data-exports.amazonaws.com"
            ]
          },
          "Action": [
            "s3:PutObject",
            "s3:GetBucketPolicy"
          ],
          "Resource": [
            "arn:aws:s3:::<bucket_name>/*",
            "arn:aws:s3:::<bucket_name>"
          ],
          "Condition": {
            "StringLike": {
              "aws:SourceAccount": "<AWS account ID>",
              "aws:SourceArn": [
                "arn:aws:cur:us-east-1:<AWS account ID>:definition/*",
                "arn:aws:bcm-data-exports:us-east-1:<AWS account ID>:export/*"
              ]
            }
          }
        }
      ]
    }
    ```
3.  **Create User and Policies:** Create a new IAM user and attach the following policies. Use the provided JSON code snippets when prompted in the IAM creation process.

    *   **ReadOnly Access Policy:**  Replace `<bucket_name>` with your bucket name.

    ```json
    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Sid": "ReportDefinition",
          "Effect": "Allow",
          "Action": ["cur:DescribeReportDefinitions"],
          "Resource": "*"
        },
        {
          "Sid": "GetObject",
          "Effect": "Allow",
          "Action": ["s3:GetObject"],
          "Resource": "arn:aws:s3:::<bucket_name>/*"
        },
        {
          "Sid": "BucketOperations",
          "Effect": "Allow",
          "Action": ["s3:ListBucket", "s3:GetBucketLocation"],
          "Resource": "arn:aws:s3:::<bucket_name>"
        }
      ]
    }
    ```
    *   **Discover Resources Policy:**

    ```json
    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Sid": "CloudtunerOperations",
          "Effect": "Allow",
          "Action": [
            "s3:GetBucketPublicAccessBlock",
            "s3:GetBucketPolicyStatus",
            "s3:GetBucketTagging",
            "iam:GetAccessKeyLastUsed",
            "cloudwatch:GetMetricStatistics",
            "s3:GetBucketAcl",
            "ec2:Describe*",
            "s3:ListAllMyBuckets",
            "iam:ListUsers",
            "s3:GetBucketLocation",
            "iam:GetLoginProfile",
            "cur:DescribeReportDefinitions",
            "iam:ListAccessKeys"
          ],
          "Resource": "*"
        }
      ]
    }
    ```
4.  **Create Access Key:** For the IAM user, create an access key and download the .csv file containing the Access key ID and Secret access key.
5.  **Connect to Cloudtuner:** Navigate to Cloudtuner and add a new AWS Root data source. Provide the Access Key ID, Secret Access Key, Export Name, S3 Bucket Name, and Export path prefix.
6.  **Connect to Cloudtuner:**  Go to Cloudtuner \u2192 Data Sources \u2192 click the Add button \u2192 select AWS Root and fill in all the required fields.

### 📦 Policy / Configuration

*   **Bucket Policy:**

```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "EnableAWSDataExportsToWriteToS3AndCheckPolicy",
      "Effect": "Allow",
      "Principal": {
        "Service": [
          "billingreports.amazonaws.com",
          "bcm-data-exports.amazonaws.com"
        ]
      },
      "Action": [
        "s3:PutObject",
        "s3:GetBucketPolicy"
      ],
      "Resource": [
        "arn:aws:s3:::<bucket_name>/*",
        "arn:aws:s3:::<bucket_name>"
      ],
      "Condition": {
        "StringLike": {
          "aws:SourceAccount": "<AWS account ID>",
          "aws:SourceArn": [
            "arn:aws:cur:us-east-1:<AWS account ID>:definition/*",
            "arn:aws:bcm-data-exports:us-east-1:<AWS account ID>:export/*"
          ]
        }
      }
    }
  ]
}
```

*   **ReadOnly Access Policy:**

```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "ReportDefinition",
      "Effect": "Allow",
      "Action": ["cur:DescribeReportDefinitions"],
      "Resource": "*"
    },
    {
      "Sid": "GetObject",
      "Effect": "Allow",
      "Action": ["s3:GetObject"],
      "Resource": "arn:aws:s3:::<bucket_name>/*"
    },
    {
      "Sid": "BucketOperations",
      "Effect": "Allow",
      "Action": ["s3:ListBucket", "s3:GetBucketLocation"],
      "Resource": "arn:aws:s3:::<bucket_name>"
    }
  ]
}
```

*   **Discover Resources Policy:**

```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "CloudtunerOperations",
      "Effect": "Allow",
      "Action": [
        "s3:GetBucketPublicAccessBlock",
        "s3:GetBucketPolicyStatus",
        "s3:GetBucketTagging",
        "iam:GetAccessKeyLastUsed",
        "cloudwatch:GetMetricStatistics",
        "s3:GetBucketAcl",
        "ec2:Describe*",
        "s3:ListAllMyBuckets",
        "iam:ListUsers",
        "s3:GetBucketLocation",
        "iam:GetLoginProfile",
        "cur:DescribeReportDefinitions",
        "iam:ListAccessKeys"
      ],
      "Resource": "*"
    }
  ]
}
```

### ⚠️ Notes & Caveats

*   AWS requires up to 24 hours to prepare the data export after configuration.
*   Cloudtuner updates or creates a new export file once a day. If the export file isn't in the specified bucket under the specified prefix, the export will fail.
*   When connecting the root account without connecting the linked accounts, expenses from the unconnected linked accounts will be ignored.
*   The bucket name in the policies must match the S3 bucket created for data exports.
*   The AWS Account ID should be 12 digits without hyphens.
*   If you encounter any issues, contact the support team at support@hystax.com.