# Hybrid Search Demo
## Create Graph Schema

In [2]:
import os
os.environ["TG_HOST"] = "http://127.0.0.1"
os.environ["TG_USERNAME"] = "tigergraph"
os.environ["TG_PASSWORD"] = "tigergraph"

In [3]:
graph_schema = {
    "graph_name": "KGRec",
    "nodes": {
        "User": {
            "primary_key": "id",
            "attributes": {
                "id": "INT",
            },
        },
        "Song": {
            "primary_key": "id",
            "attributes": {
                "id": "INT",
                "description": "STRING",
            },
            "vector_attributes": {"emb_1": 1536},
        },
    },
    "edges": {
        "downloaded": {
            "is_directed_edge": False,
            "from_node_type": "User",
            "to_node_type": "Song",
        },
        "similar_to": {
            "is_directed_edge": False,
            "from_node_type": "Song",
            "to_node_type": "Song",
            "attributes": {
                "score": "DOUBLE",
            },
        },
    },
}

In [4]:
from tigergraphx import Graph
G = Graph(graph_schema)

## Load Data

In [5]:
loading_job_config = {
    "loading_job_name": "loading_job",
    "files": [
        {
            "file_alias": "f_song",
            "file_path": "/home/tigergraph/data/KGRec/song_embeddings.csv",
            "csv_parsing_options": {
                "separator": ",",
                "header": True,
            },
            "node_mappings": [
                {
                    "target_name": "Song",
                    "attribute_column_mappings": {
                        "id": "item_id",
                        "description": "description",
                        "emb_1": 'SPLIT($"embedding", " ")',
                    },
                }
            ],
        },
        {
            "file_alias": "f_downloads",
            "file_path": "/home/tigergraph/data/KGRec/implicit_lf_dataset.csv",
            "csv_parsing_options": {
                "separator": "\t",
                "header": False,
            },
            "node_mappings": [
                {
                    "target_name": "User",
                    "attribute_column_mappings": {
                        "id": 0,
                    },
                },
                {
                    "target_name": "Song",
                    "attribute_column_mappings": {
                        "id": 1,
                    },
                }
            ],
            "edge_mappings": [
                {
                    "target_name": "downloaded",
                    "source_node_column": 0,
                    "target_node_column": 1,
                }
            ],
        },
        {
            "file_alias": "f_similar_to",
            "file_path": "/home/tigergraph/data/KGRec/similar_songs.csv",
            "csv_parsing_options": {
                "separator": ",",
                "header": True,
            },
            "edge_mappings": [
                {
                    "target_name": "similar_to",
                    "source_node_column": "song_id_1",
                    "target_node_column": "song_id_2",
                    "attribute_column_mappings": {
                        "score": "similarity_score",
                    },
                }
            ],
        },
    ],
}

In [6]:
G.load_data(loading_job_config)

2025-03-07 17:32:41,727 - tigergraphx.core.managers.data_manager - INFO - Initiating data load for job: loading_job...
2025-03-07 17:32:54,766 - tigergraphx.core.managers.data_manager - INFO - Data load completed successfully.


## Graph-based Similarity Search

In [7]:
graph_search_results = G.run_query("graph_based_similarity_search", params={"input": 17418216, "k": 4})
for result in graph_search_results:
    for key, songs in result.items():
        for song in songs:
            print(song)

{'v_id': '4425', 'v_type': 'Song', 'attributes': {'id': 4425, 'description': "Thousand Foot Krutch vocalist Trevor McNevan -LRB- from NewReleaseTuesday -RRB- : `` This is another firecracker , more of an adrenaline rock song .\\nI could n't help but picture NASCAR drivers flying by on the track to this .\\nI love big , anthemic songs that are calls to action - so this one is case and point . ''", '@sum_score': 4.889628140900232, '@visited': False}}
{'v_id': '5148', 'v_type': 'Song', 'attributes': {'id': 5148, 'description': "TFK frontman/songwriter Trevor McNevan had the idea for this song for some time .\\nHe told NewReleaseTuesday : `` Although it 's in the same vein as some of our other high-octane songs , like ` Fire It Up , ' it 's quite different .\\nI wanted it to have that U2 Vertigo type vibe ; that big stadium energy with single notes on the main guitar riff , instead of chords . ''\\nThis was a challenge for McNevan to sing as its one of the highest songs vocally he 's writt

## Vector-based Similarity Search

In [8]:
import numpy as np
df = G.get_neighbors(start_nodes=17418216, start_node_type="User", edge_types="downloaded")
song_ids = set(df['id'])
songs = G.fetch_nodes(song_ids, vector_attribute_name="emb_1", node_type="Song")
embeddings = np.array(list(songs.values()))
user_embedding = np.mean(embeddings, axis=0)
print(embeddings.shape)

(59, 1536)


In [9]:
print(user_embedding.shape)

(1536,)


In [10]:
vector_search_results = G.search(
    data=user_embedding.tolist(),
    vector_attribute_name="emb_1",
    node_type="Song",
    limit=4,
    return_attributes=["id", "description"]
)
for node in vector_search_results:
    print(node)

{'id': 5996, 'distance': 0.08263087, 'description': "Frontman Justin Pierre told Alternative Press that the genesis of this song harks back to 2007 : `` The original idea for this song came while we were recording Even If It Kills Me .\\nI had a few lines for verses and part of the chorus , but I was n't sure where it was going .\\nThere was n't enough time to explore it back then , so we saved it for this record .\\nI had this strange image in my head of two people sitting on the roof of a house at night in the fall , shivering slightly and silently together ; their only comfort each other .\\nI see this song as a melancholy anthem for those of us who really wish we were more than just a science experiment , but fear that that 's probably not the case . ''"}
{'id': 2424, 'distance': 0.08361697, 'description': "Lead singer Christian Lindskog : `` This was a possible title for the record for me .\\nIt was one of the first things I wrote and was very much tied to the intro piece .\\nDuri

## Hybrid Search

In [19]:
import pandas as pd

# Extract graph-based recommendations
graph_recs = []
for result in graph_search_results:
    if isinstance(result, dict):  # Ensure result is a dictionary
        for key, songs in result.items():
            if isinstance(songs, list):  # Ensure songs is a list
                for song in songs:
                    if isinstance(song, dict) and 'attributes' in song:
                        graph_recs.append({
                            "id": int(song.get('v_id', 0)),  # Default ID to 0 if missing
                            "graph_score": song['attributes'].get('@sum_score', 0),  # Default to 0 if missing
                            "description": song['attributes'].get('description', 'No description available')  # Default description
                        })

# Extract vector-based recommendations
vector_recs = [
    {
        "id": int(node.get("id", 0)),  # Default ID to 0 if missing
        "vector_distance": node.get("distance", 1.0),  # Default max distance to 1.0
        "description": node.get("description", "No description available")  # Default description
    }
    for node in vector_search_results
]

# Convert to DataFrame
df_graph = pd.DataFrame(graph_recs)
df_vector = pd.DataFrame(vector_recs)

# Convert `id` column to int before merging
df_graph['id'] = df_graph['id'].astype(int)
df_vector['id'] = df_vector['id'].astype(int)

# Normalize Graph Scores
if not df_graph.empty and 'graph_score' in df_graph:
    df_graph['graph_score_norm'] = (df_graph['graph_score'] - df_graph['graph_score'].min()) / \
                                   (df_graph['graph_score'].max() - df_graph['graph_score'].min())
else:
    df_graph['graph_score_norm'] = 0  # Default normalization if empty

# Normalize Vector Scores (inverse because lower is better)
if not df_vector.empty and 'vector_distance' in df_vector:
    df_vector['vector_score_norm'] = (df_vector['vector_distance'].max() - df_vector['vector_distance']) / \
                                     (df_vector['vector_distance'].max() - df_vector['vector_distance'].min())
else:
    df_vector['vector_score_norm'] = 0  # Default normalization if empty

# Merge both DataFrames
df_merged = pd.merge(df_graph, df_vector, on='id', how='outer')

# Fill missing scores and descriptions
df_merged['graph_score_norm'] = df_merged['graph_score_norm'].fillna(0)
df_merged['vector_score_norm'] = df_merged['vector_score_norm'].fillna(0)
df_merged['description_x'] = df_merged['description_x'].fillna(df_merged['description_y'])
df_merged = df_merged.rename(columns={"description_x": "description"}).drop(columns=["description_y"])

# Compute Hybrid Score with weight Î± = 0.5
alpha = 0.5
df_merged['hybrid_score'] = alpha * df_merged['graph_score_norm'] + (1 - alpha) * df_merged['vector_score_norm']

# Sort by Hybrid Score and select top 4
df_sorted = df_merged.sort_values(by='hybrid_score', ascending=False).head(4)

# Print results one by one
for _, row in df_sorted.iterrows():
    print(f"ID: {row['id']}")
    print(f"Hybrid Score: {row['hybrid_score']:.4f}")
    print(f"Description: {row['description']}\n" + "-" * 80)

ID: 4425
Hybrid Score: 0.5000
Description: Thousand Foot Krutch vocalist Trevor McNevan -LRB- from NewReleaseTuesday -RRB- : `` This is another firecracker , more of an adrenaline rock song .\nI could n't help but picture NASCAR drivers flying by on the track to this .\nI love big , anthemic songs that are calls to action - so this one is case and point . ''
--------------------------------------------------------------------------------
ID: 5996
Hybrid Score: 0.5000
Description: Frontman Justin Pierre told Alternative Press that the genesis of this song harks back to 2007 : `` The original idea for this song came while we were recording Even If It Kills Me .\nI had a few lines for verses and part of the chorus , but I was n't sure where it was going .\nThere was n't enough time to explore it back then , so we saved it for this record .\nI had this strange image in my head of two people sitting on the roof of a house at night in the fall , shivering slightly and silently together ; the

## Vector Search for QA System

In [22]:
import openai

def get_question_embedding(question, model="text-embedding-ada-002"):
    """Convert a question into an embedding (List[float]) using OpenAI API."""
    try:
        response = openai.embeddings.create(input=[question], model=model)
        return response.data[0].embedding  # Returns the embedding as List[float]
    except Exception as e:
        print(f"Error generating embedding: {e}")
        return None  # Return None if there is an error

question = 'Are there any songs in the dataset that mention a specific genre (e.g., "rock," "jazz," "pop") in their descriptions?'
embedding = get_question_embedding(question)

2025-03-07 19:24:11,314 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


In [61]:
retrieved_songs = G.search(
    data=embedding,
    vector_attribute_name="emb_1",
    node_type="Song",
    limit=10,
)
print(retrieved_songs)

[{'id': 504, 'distance': 0.2286872, 'description': "Justin Timberlake ends his The 20/20 Experience album on an ambient note .\\nHe explained in an interview with MySpace that such songs as this tune and `` Strawberry Bubblegum '' were his attempts to put his own spin on Radiohead 's electronic rock .\\n`` I want people to close their eyes and listen to this album , '' he said .\\n`` I really do think my effort with the last album was to make people dance , and I think with this album , I wrote a lot of songs that make me want to sing ... and dance . ''"}, {'id': 63, 'distance': 0.2309018, 'description': "Brooklyn quintet Friends may be a Pop band , but their lyrical content is more thoughtful than many others in their genre .\\n`` One thing that I think is really cool is to create a sense of joy and empowerment , '' said Samantha Urbani to The Independent .\\n`` But not just with a message of jubilant positivity - we have a song about death and anticipating it called ` Ideas on Ghosts

In [62]:
def generate_llm_prompt(question, retrieved_songs):
    """Generate a structured prompt for an LLM to answer a question using retrieved song descriptions."""
    
    prompt_template = """You are an expert in analyzing song descriptions and answering user queries based on provided song data.

### Task:
Answer the following question based on the retrieved song descriptions. Use the given information to generate a relevant, concise, and insightful response.

### Question:
{question}

### Retrieved Songs:
{retrieved_songs}

Each song entry consists of:
- **id**: A unique identifier for the song.
- **description**: A textual description of the song.

### Instructions:
1. **Analyze** the descriptions to find relevant information related to the question.
2. **Synthesize** an answer using the most relevant songs.
3. **Provide explanations** or insights if necessary.
4. **Avoid speculation** beyond the provided descriptions.

### Response:
"""

    # Format the retrieved songs as a structured string
    song_entries = "\n".join(
        [f"- id: {song['id']}\n Description: {song['description']}" for song in retrieved_songs]
    )

    return prompt_template.format(question=question, retrieved_songs=song_entries)

llm_prompt = generate_llm_prompt(question, retrieved_songs)

# Print the generated prompt
print(llm_prompt)

You are an expert in analyzing song descriptions and answering user queries based on provided song data.

### Task:
Answer the following question based on the retrieved song descriptions. Use the given information to generate a relevant, concise, and insightful response.

### Question:
Are there any songs in the dataset that mention a specific genre (e.g., "rock," "jazz," "pop") in their descriptions?

### Retrieved Songs:
- id: 504
 Description: Justin Timberlake ends his The 20/20 Experience album on an ambient note .\nHe explained in an interview with MySpace that such songs as this tune and `` Strawberry Bubblegum '' were his attempts to put his own spin on Radiohead 's electronic rock .\n`` I want people to close their eyes and listen to this album , '' he said .\n`` I really do think my effort with the last album was to make people dance , and I think with this album , I wrote a lot of songs that make me want to sing ... and dance . ''
- id: 63
 Description: Brooklyn quintet Frie

In [63]:
def chat_with_openai(llm_prompt, model="gpt-4"):
    """Send the LLM prompt to OpenAI's API and get a response using the new OpenAI API (>=1.0.0)."""
    try:
        client = openai.OpenAI()  # New API requires initializing a client
        response = client.chat.completions.create(
            model=model,
            messages=[
                {"role": "system", "content": "You are a helpful assistant that analyzes song descriptions."},
                {"role": "user", "content": llm_prompt}
            ],
            temperature=0.7
        )
        return response.choices[0].message.content
    except Exception as e:
        print(f"Error querying OpenAI: {e}")
        return None

response = chat_with_openai(llm_prompt)
print(response)

2025-03-07 20:19:11,043 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
Yes, there are several songs in the dataset that mention specific genres in their descriptions. 

For instance, the song with id: 504 mentions "electronic rock" as Justin Timberlake tried to incorporate this genre into his own music. Similarly, song id: 63 is described as a "pop" song by the Brooklyn quintet Friends. 

The song with id: 525 spans sounds of multiple genres like "folk", "country", "rock" and "Americana". The genre isn't predetermined but the song's sound can lean towards a specific genre depending on the instruments used.

Song id: 5982 is described as a "pop leaning toe-tapper" by the artist Jewel, indicating its pop genre. Song id: 3095 draws on reggae-inspired genres like dancehall and dub.

Finally, the song with id: 366 implements "jazz" elements, sampling a track from Jazz saxophonist Bill Evans.


## Hybrid Search for QA System

In [64]:
retrieved_songs = G.search(
    data=embedding,
    vector_attribute_name="emb_1",
    node_type="Song",
    limit=5,
)
print(retrieved_songs)

[{'id': 63, 'distance': 0.2309018, 'description': "Brooklyn quintet Friends may be a Pop band , but their lyrical content is more thoughtful than many others in their genre .\\n`` One thing that I think is really cool is to create a sense of joy and empowerment , '' said Samantha Urbani to The Independent .\\n`` But not just with a message of jubilant positivity - we have a song about death and anticipating it called ` Ideas on Ghosts , ' but it 's a dance song so you reach this level of cathartic movement and energised dancing , but with a sense of awareness of all the negativity and all of the sadness and all of the spectrum of human feelings .\\nI think it 's really cool because a lot of dance songs are just sexy come-ons .\\nThis is thinking persons ' pop music , I hope . ''"}, {'id': 8418, 'distance': 0.230859, 'description': "According to Cave , the songs on Push The Sky were composed from `` Googling curiosities -LSB- and -RSB- being entranced by exotic Wikipedia entries , '' wi

In [65]:
retrieved_song_ids = [song["id"] for song in retrieved_songs]
neighbors = G.run_query("get_neighbors", params={"input": retrieved_song_ids, "k": 5})
print(neighbors)

[{'SimilarSongs': [{'v_id': '2512', 'v_type': 'Song', 'attributes': {'id': 2512, 'description': "This brooding song features the lyric , `` I believe in God . ''\\nSpeaking with The Sun , Cave put his apparent statement of faith into context into context .\\n`` I 'm talking about believing in God , believing in mermaids and believing in 72 virgins , '' he said .\\n`` The song starts reeling off the options and I think what I 'm really saying is I believe in the idea of believing in things.The fact that we humans have that capacity or need to believe is not a shameful thing as some people might see it , but hugely endearing . ''\\nCave also references an incident in December 2010 when he crashed his Jaguar car into a speed camera .\\n`` I do driver alertness course , '' he intones in his baritone croon .\\nThe singer and his twin sons all walked away from the accident unharmed , but Cave had to attend a driver alertness course as a punishment .\\n`` I rather liked it , '' he told The Su

In [66]:
# Convert vector search results to a list of dictionaries
combined_results = {song["id"]: {
    "id": song["id"],
    "description": song["description"]
} for song in retrieved_songs}

# Add graph search results (ensuring no duplicates)
for song in neighbors[0]["SimilarSongs"]:
    song_id = int(song["v_id"])
    if song_id not in combined_results:  # Avoid duplicates
        combined_results[song_id] = {
            "id": song_id,
            "description": song["attributes"]["description"]
        }

# Convert the merged dictionary back to a list format
retrieved_songs_combined = list(combined_results.values())

In [67]:
llm_prompt = generate_llm_prompt(question, retrieved_songs_combined)

# Print the generated prompt
print(llm_prompt)

You are an expert in analyzing song descriptions and answering user queries based on provided song data.

### Task:
Answer the following question based on the retrieved song descriptions. Use the given information to generate a relevant, concise, and insightful response.

### Question:
Are there any songs in the dataset that mention a specific genre (e.g., "rock," "jazz," "pop") in their descriptions?

### Retrieved Songs:
- id: 63
 Description: Brooklyn quintet Friends may be a Pop band , but their lyrical content is more thoughtful than many others in their genre .\n`` One thing that I think is really cool is to create a sense of joy and empowerment , '' said Samantha Urbani to The Independent .\n`` But not just with a message of jubilant positivity - we have a song about death and anticipating it called ` Ideas on Ghosts , ' but it 's a dance song so you reach this level of cathartic movement and energised dancing , but with a sense of awareness of all the negativity and all of the 

In [68]:
response = chat_with_openai(llm_prompt)
print(response)

2025-03-07 20:19:29,883 - httpx - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
Yes, several songs in the dataset do mention specific genres in their descriptions. 

- Song with id: 63 is described as a Pop band, indicating the genre of pop. 

- The song with id: 3095 draws on reggae-inspired genres like dancehall and dub. 

- For the song with id: 504, Justin Timberlake is attempting to put his own spin on Radiohead's electronic rock, implying the genre of electronic rock. 

- The song with id: 4565 is by the Australian Alternative Rock band Nick Cave and the Bad Seeds, indicating the genre of alternative rock. 

- Song with id: 5362 is described as a Rock-Operaesque tune, suggesting the genre of rock. 

- Lastly, the song with id: 3148 contains a sample from Jamaican reggae musician King Sporty's track, indicating the genre of reggae. 

Therefore, the genres mentioned in these song descriptions include pop, reggae, dancehall, dub, electronic r

## Drop Graph

In [69]:
>>> G.drop_graph()

2025-03-07 20:54:05,507 - tigergraphx.core.managers.schema_manager - INFO - Dropping graph: KGRec...
2025-03-07 20:54:09,792 - tigergraphx.core.managers.schema_manager - INFO - Graph dropped successfully.


---