# Neptune Analytics as Hybrid Graph/Vector Store

Cognee.ai supports Neptune Analytics as a hybrid adaptor: providing both graph and vector storage. This allows cognee to use the same storage medium for graph-based queries and vector-similarity searches.

In this notebook, we demonstrate how to connect to an Amazon Neptune Analytics instance using the Cognee.ai configuration, which uses AWS Langchain and boto3 under the hood to connect to the AWS service.

Apart from the general installation of Cognee.ai, you will need an Amazon Neptune Analytics instance running with access.

References:
- [What is Neptune Analytics](https://docs.aws.amazon.com/neptune-analytics/latest/userguide/what-is-neptune-analytics.html)
- [Vector Similarity using Neptune Analytics](https://docs.aws.amazon.com/neptune-analytics/latest/userguide/vector-similarity.html)
- [Amazon CLI credentials and configuration](https://docs.aws.amazon.com/cli/v1/userguide/cli-chap-configure.html#configure-precedence)

# Prerequisites

## 1. Amazon Neptune Analytics Instance Setup

Create an Amazon Neptune Analytics instance in your AWS account following the [AWS documentation](https://docs.aws.amazon.com/neptune-analytics/latest/userguide/get-started.html). Please note you will also need to configuration the following:
- Under `Network and Security` | `enable public connectivity`, allow your graph to be reachable over the internet if accessing from outside a VPC.
- Under `Vector search settings` | `Vector search dimension configuration` | `Use vector dimension`. The Neptune Analytics instance must be created using the same vector dimensions as the embedding model creates. See: https://docs.aws.amazon.com/neptune-analytics/latest/userguide/vector-index.html. For example, if using OpenAI LLM `openai/text-embedding-3-small`, which uses 1536-dimension embeddings, your Neptune Analytics vector store must be configured to accept 1536-dimension vectors.
- Once the Amazon Neptune Analytics instance is available, you will need the graph-identifier to connect.

## 2. Attach Credentials

Configure your AWS credentials with access to your Amazon Neptune Analytics resources by following the [Configuration and credentials precedence](https://docs.aws.amazon.com/cli/v1/userguide/cli-chap-configure.html#configure-precedence). You can do this by declaring environment variables in your `.env` file in the project root directory and importing dotenv.

```
export AWS_ACCESS_KEY_ID=your-access-key
export AWS_SECRET_ACCESS_KEY=your-secret-key
export AWS_SESSION_TOKEN=your-session-token
export AWS_DEFAULT_REGION=your-region

# this is the NA graph identifier
export AWS_NEPTUNE_ANALYTICS_GRAPH_ID=g-your-graph
```

The IAM user or role making the request must have a policy attached that allows one of the following IAM actions in that neptune-graph:
```
neptune-graph:ReadDataViaQuery
neptune-graph:WriteDataViaQuery
neptune-graph:DeleteDataViaQuery
```

## 3. Configure Cognee.ai

To connect to Amazon Neptune Analytics, you need to add the "neptune_analytics" provider and graph endpoint url to your graph and vector configuration.

```python
import os
import cognee
from dotenv import load_dotenv

# load environment variables from .env
load_dotenv()

graph_identifier = os.getenv('AWS_NEPTUNE_ANALYTICS_GRAPH_ID', "") # graph with 1536 dimensions for vector search

# Configure Neptune Analytics as the graph & vector database provider
cognee.config.set_graph_db_config(
    {
        "graph_database_provider": "neptune_analytics",  # Specify Neptune Analytics as provider
        "graph_database_url": f"neptune-graph://{graph_identifier}",  # Neptune Analytics endpoint with the format neptune-graph://<GRAPH_ID>
    }
)
cognee.config.set_vector_db_config(
    {
        "vector_db_provider": "neptune_analytics",  # Specify Neptune Analytics as provider
        "vector_db_url": f"neptune-graph://{graph_identifier}",  # Neptune Analytics endpoint with the format neptune-graph://<GRAPH_ID>
    }
)
```

In [None]:
import os
import pathlib
from cognee import config, add, cognify, search, SearchType, prune, visualize_graph
from dotenv import load_dotenv

## Configuration

Do all the imports and configure the graph and vector providers.
Uses the default openai llm, so make sure you have an openai api key configured or configure another llm.

In [None]:
# load environment variables from file .env
load_dotenv()

current_directory = os.getcwd()

data_directory_path = str(
    pathlib.Path(
        os.path.join(pathlib.Path(current_directory), ".data_storage")
    ).resolve()
)
# Set up the data directory. Cognee will store files here.
config.data_root_directory(data_directory_path)

cognee_directory_path = str(
    pathlib.Path(
        os.path.join(pathlib.Path(current_directory), ".cognee_system")
    ).resolve()
)
# Set up the Cognee system directory. Cognee will store system files and databases here.
config.system_root_directory(cognee_directory_path)

# Set up Amazon credentials in .env file and get the values from environment variables
graph_identifier = os.getenv('AWS_NEPTUNE_ANALYTICS_GRAPH_ID', "")

# Configure Neptune Analytics as the graph & vector database provider
config.set_graph_db_config(
    {
        "graph_database_provider": "neptune_analytics",  # Specify Neptune Analytics as provider
        "graph_database_url": f"neptune-graph://{graph_identifier}",  # Neptune Analytics endpoint with the format neptune-graph://<GRAPH_ID>
    }
)
config.set_vector_db_config(
    {
        "vector_db_provider": "neptune_analytics",  # Specify Neptune Analytics as provider
        "vector_db_url": f"neptune-graph://{graph_identifier}",  # Neptune Analytics endpoint with the format neptune-graph://<GRAPH_ID>
    }
)

## Clean up environment

Prune existing data in the graph store

In [None]:
# Prune data and system metadata before running, only if we want "fresh" state.
await prune.prune_data()
await prune.prune_system(metadata=True)

## Setup data and cognify

Create a dataset containing Neptune descriptions.  The

In [None]:
# Add sample text to the dataset
sample_text_1 = """Neptune Analytics is a memory-optimized graph database engine for analytics. With Neptune
    Analytics, you can get insights and find trends by processing large amounts of graph data in seconds. To analyze
    graph data quickly and easily, Neptune Analytics stores large graph datasets in memory. It supports a library of
    optimized graph analytic algorithms, low-latency graph queries, and vector search capabilities within graph
    traversals.
    """

sample_text_2 = """Neptune Analytics is an ideal choice for investigatory, exploratory, or data-science workloads
    that require fast iteration for data, analytical and algorithmic processing, or vector search on graph data. It
    complements Amazon Neptune Database, a popular managed graph database. To perform intensive analysis, you can load
    the data from a Neptune Database graph or snapshot into Neptune Analytics. You can also load graph data that's
    stored in Amazon S3.
    """

# Create a dataset
dataset_name = "neptune_descriptions"

# Add the text data to Cognee.
await add([sample_text_1, sample_text_2], dataset_name)

# Cognify the text data.
await cognify([dataset_name])

## Graph Memory visualization

Initialize Memgraph as a Graph Memory store and save to .artefacts/graph_visualization.html

![visualization](./neptune_analytics_demo.png)

In [None]:
# Get a graphistry url (Register for a free account at https://www.graphistry.com)
# url = await render_graph()
# print(f"Graphistry URL: {url}")

# Or use our simple graph preview
graph_file_path = str(
    pathlib.Path(
        os.path.join(pathlib.Path(current_directory), ".artifacts/graph_visualization.html")
    ).resolve()
)
await visualize_graph(graph_file_path)

## SEARCH: Graph Completion

Search using the query "What is Neptune Analytics?" and return the graph completion with nodes/edges related to the query.

In [None]:
# Completion query that uses graph data to form context.
graph_completion = await search(query_text="What is Neptune Analytics?", query_type=SearchType.GRAPH_COMPLETION)
print("\nGraph completion result is:")
print(graph_completion)

## SEARCH: RAG Completion

Search using the query "What is Neptune Analytics?" and return a LLM-based completion searches of edges/nodes.

In [None]:
# Completion query that uses document chunks to form context.
rag_completion = await search(query_text="What is Neptune Analytics?", query_type=SearchType.RAG_COMPLETION)
print("\nRAG Completion result is:")
print(rag_completion)

## SEARCH: Graph Insights

Search for insight relationshipts related to "Neptune Analytics" as a context.

In [None]:
# Search graph insights
insights_results = await search(query_text="Neptune Analytics", query_type=SearchType.INSIGHTS)
print("\nInsights about Neptune Analytics:")
for result in insights_results:
    src_node = result[0].get("name", result[0]["type"])
    tgt_node = result[2].get("name", result[2]["type"])
    relationship = result[1].get("relationship_name", "__relationship__")
    print(f"- {src_node} -[{relationship}]-> {tgt_node}")

## SEARCH: Entity Summaries

Search for summary nodes related to "Neptune Analytics" as a context.

In [None]:
# Query all summaries related to query.
summaries = await search(query_text="Neptune Analytics", query_type=SearchType.SUMMARIES)
print("\nSummary results are:")
for summary in summaries:
    type = summary["type"]
    text = summary["text"]
    print(f"- {type}: {text}")

## SEARCH: Chunks

Search for chuck nodes related to "Neptune Analytics" as a context.

In [None]:
chunks = await search(query_text="Neptune Analytics", query_type=SearchType.CHUNKS)
print("\nChunk results are:")
for chunk in chunks:
    type = chunk["type"]
    text = chunk["text"]
    print(f"- {type}: {text}")