# **Creating a Neural Search Method**
Now that I've got some helpful methods within the **`01. Writing Postgres Queries`** notebook, I can spend some time writing a "neural search" method. This will be used for the API. 

# Setup
The cells below will set up the rest of the notebook.

I'll start by configuring the kernel: 

In [1]:
# Change the working directory 
%cd ..

# Enable the autoreload extension, which will automatically load in new code as it's written
%load_ext autoreload
%autoreload 2

d:\data\programming\neural-needledrop\api


Now I'll import some necessary modules:

In [2]:
# General import statements
import pandas as pd
import datetime
from IPython.display import Markdown, display

# Importing custom modules
from utils.settings import (
    POSTGRES_USER,
    POSTGRES_PASSWORD,
    POSTGRES_HOST,
    POSTGRES_PORT,
    POSTGRES_DB,
)
import utils.postgres_queries as pg_queries
import utils.postgres as postgres
from sqlalchemy import create_engine, MetaData
from sqlalchemy.orm import sessionmaker, declarative_base

I'll also set up my connection to the Postgres server: 

In [3]:
# Create the connection string to the database
postgres_connection_string = f"postgresql://{POSTGRES_USER}:{POSTGRES_PASSWORD}@{POSTGRES_HOST}:{POSTGRES_PORT}/{POSTGRES_DB}"

# Create the connection engine
engine = create_engine(postgres_connection_string)
metadata = MetaData()
session = sessionmaker(bind=engine)()
Base = declarative_base()

# **Prototyping Neural Search**
Below, I'm going to prototype a search method. 

I'll start by parameterizing the search:

In [18]:
# Parameterize the search
query = "Love Me Forever, which was a very nice, airy, euphoric ballad, is just a minute and change"
release_date_filter = [datetime.datetime(2010, 1, 1), datetime.datetime(2024, 6, 1)]
video_type_filter = ["album_review", "mixtape_review"]
review_score_filter = [8, 10]

# Extra parameters
n_chunks_to_consider_initially = 250
n_most_similar_chunks_per_video = 10
n_videos_to_return = 10
n_segment_chunks_to_showcase = 3

Now, I'll try and identify the most similar segments:

In [19]:
# Run the query
similar_chunks_df = pg_queries.most_similar_embeddings_to_text_filtered(
    text=query,
    engine=engine,
    n=n_chunks_to_consider_initially,
    release_date_filter=release_date_filter,
    video_type_filter=video_type_filter,
    review_score_filter=review_score_filter,
    include_text=True,
)

With this in hand, I want to aggregate across a bunch of videos, and try to determine which video has the highest scores.

In [20]:
# Groupby `url`, and take the top `n_most_similar_chunks_per_video` chunks per video
aggregated_similar_chunks_df = similar_chunks_df.groupby("url").head(
    n_most_similar_chunks_per_video
)

# Aggregate the similarity statistics
aggregated_similar_chunks_df = (
    aggregated_similar_chunks_df.groupby("url")
    .agg(
        median_similarity=("cos_sim", "median"),
        n_similar_chunks=("cos_sim", "count"),
    )
    .reset_index()
)

# Add a weighted median similarity column
aggregated_similar_chunks_df["weighted_median_similarity"] = (
    aggregated_similar_chunks_df["median_similarity"]
    * aggregated_similar_chunks_df["n_similar_chunks"]
)

# Sort by the weighted median similarity
aggregated_similar_chunks_df = aggregated_similar_chunks_df.sort_values(
    "weighted_median_similarity", ascending=False
).head(n_videos_to_return)

Now, we're going to get some metadata about each video back. This will involve uploading a temporary table to Postgres, and then joining it to the `video_metadata` table:

In [21]:
# Create a temporary table called `temp_similar_chunks` that is the aggregated_similar_chunks_df DataFrame
with engine.connect() as conn:
    aggregated_similar_chunks_df.to_sql(
        "temp_similar_chunks", conn, if_exists="replace", index=False
    )

# Now, select the entire `video_metadata` table for each of the videos in the `temp_similar_chunks` table
similar_chunks_video_metadata_df = postgres.query_postgres(
    """
    SELECT 
        video_metadata.*, 
        temp_similar_chunks.median_similarity, 
        temp_similar_chunks.n_similar_chunks, 
        temp_similar_chunks.weighted_median_similarity
    FROM video_metadata
    JOIN temp_similar_chunks
    ON video_metadata.url = temp_similar_chunks.url
    ORDER BY temp_similar_chunks.weighted_median_similarity DESC
    """,
    engine=engine,
)

Finally, we're going to prepare our results. This will just be the `similar_chunks_video_metadata_df`, except also containing the `n_segment_chunks_to_showcase` most similar segment chunks per video. 

In [22]:
# Create a DataFrame containing the segment chunks I want to showcase
segment_chunks_to_showcase_df = (
    (
        similar_chunks_df[
            similar_chunks_df["url"].isin(
                similar_chunks_video_metadata_df["url"].unique()
            )
        ]
        .sort_values("cos_sim", ascending=False)
        .groupby("url")
        .head(n_segment_chunks_to_showcase)
        .sort_values(["url", "cos_sim"], ascending=False)
    )
    .groupby("url")
    .agg(
        top_segment_chunks=("text", lambda x: list(x)),
    )
    .reset_index()
)

# Merge this DataFrame with the video metadata
segment_chunks_to_showcase_df = segment_chunks_to_showcase_df.merge(
    similar_chunks_video_metadata_df, on="url"
).sort_values("weighted_median_similarity", ascending=False)

Now, I'm going to print the results using some nice formatting:

In [25]:
for index, row in segment_chunks_to_showcase_df.head(3).iterrows():
    display(Markdown(f"**{row['title']}**"))
    display(Markdown("\n".join([f"* {chunk}" for chunk in row['top_segment_chunks']])))


**Twin Shadow- Forget ALBUM REVIEW**

* I'm still feeling a really BVA on this album. A B-F-1. I think if Twin Shadow shoots for a more well-produced LP, the next go around, he's going to come out with something amazing.
* Maybe a flood here, a very slightly off-note there. However, I wouldn't say it makes the record difficult to listen to. The song writing is still really good in the sound effects, post-production, top notch. Some moments on the cell peak get really slow and moody and lurk like a mutant. The tracks castles in the snow or even the opener of this album come to mind. Very dark moments here, but still keep a very steady beat. While other tracks kind of feel like they're living the hayday of the danceier side of post punk in the 80s, a lot of disco influence, something that kind of reminds me of bands
* like shriek back on classic singles like my spine is the baseline. Just that type of music that was highly, highly influential to that DFA record sound. Songs like for now and shooting holes at the moon kind of bring that to mind. I am enjoying it even though it's not the most amazing dance music I've ever heard. I do kind of want to warn you guys that I am a little bit of a sucker for this kind of stuff. Whether it's a slower track or a faster track, there are even a few intros on songs that I don't really care for with this thing. But no matter where a track may start or end up, the hooks are always razor sharp.

**Krallice - Crystalline Exhaustion ALBUM REVIEW**

* There's a variety of synth patches on the track too, huge horns, spectral bells, a short and sweet ambient interlude too, which goes over better than some of the stuff the band was toying with on Goby for gotten. Then finally we have the massive title track, which lasts 14 minutes and is really an edge
* chilly drone that sits on top of the crunchy and ever changing layers of drums and guitars that hang below. It's not really changing up Kralis's tried and true formula and sound, but it's there. It's providing something. It's an element. The pummeling performance and production on this track is really the driver of this song. And that's okay. It's a very visceral and unforgiving beginning to this LP.
* on the whole project. Really, the drumming on this track is a total workout in creates and almost hypnotic sensation for the six minutes at last, which is really enhanced by the ethereal synth leads and the spiraling riffs that you can really get lost in.

**The Drums- Self-Titled ALBUM REVIEW**

* Painfully emotional and about is straightforward as a punch to the face It's extreme pops and plasticity. It's the drums self titled full length You should really be putting your ears on that score But that's pretty much just how I'm feeling let me know what you guys think of this to you love this thing
* Do you hate it and why and also favorite this video and make YouTube a better place Anthony Fantano the drums forever
* Just kind of irrelevant if you're kind of a vocal Pyrrist if you really like your vocals on key you like them concise this may not be the release for you But if you're looking for something that is more emotional than it is on key something bright light and Summary something that's gonna make new order joy division and the Smith's fans warm where it counts

# **Functionalized Version**
I took all of the code above and wrote a single method from it. Below, I'll show it off:

In [27]:
from utils.search import neural_search

# Run the search
neural_search(
    query=query,
    release_date_filter=release_date_filter,
    video_type_filter=video_type_filter,
    review_score_filter=review_score_filter,
    n_most_similar_chunks_per_video=n_most_similar_chunks_per_video,
    n_videos_to_return=n_videos_to_return,
    n_segment_chunks_to_showcase=n_segment_chunks_to_showcase,
)

[{'url': 'https://www.youtube.com/watch?v=CB-HKdkM7hg',
  'top_segment_chunks': ["I'm still feeling a really BVA on this album. A B-F-1. I think if Twin Shadow shoots for a more well-produced LP, the next go around, he's going to come out with something amazing.",
   "Maybe a flood here, a very slightly off-note there. However, I wouldn't say it makes the record difficult to listen to. The song writing is still really good in the sound effects, post-production, top notch. Some moments on the cell peak get really slow and moody and lurk like a mutant. The tracks castles in the snow or even the opener of this album come to mind. Very dark moments here, but still keep a very steady beat. While other tracks kind of feel like they're living the hayday of the danceier side of post punk in the 80s, a lot of disco influence, something that kind of reminds me of bands",
   "like shriek back on classic singles like my spine is the baseline. Just that type of music that was highly, highly influen