# **Writing Postgres Queries**
In order to write the API, I'll need a couple of different Postgres queries. I'll test them out throughout this notebook! 

# Setup
The cells below will set up the rest of the notebook.

I'll start by configuring the kernel: 

In [1]:
# Change the working directory 
%cd ..

# Enable the autoreload extension, which will automatically load in new code as it's written
%load_ext autoreload
%autoreload 2

d:\data\programming\neural-needledrop\api


Now I'll import some necessary modules:

In [2]:
# General import statements
from sqlalchemy import create_engine, MetaData
from sqlalchemy.orm import sessionmaker, declarative_base
import datetime
import pandas as pd

# Importing custom modules
from utils.openai import embed_text
from utils.settings import (
    POSTGRES_USER,
    POSTGRES_PASSWORD,
    POSTGRES_HOST,
    POSTGRES_PORT,
    POSTGRES_DB,
    LOG_TO_CONSOLE,
)
from utils.logging import get_logger
from utils.postgres import query_postgres
import utils.postgres_queries as pg_queries

# Set up a logger for this notebook
logger = get_logger("postgres_notebook", log_to_console=LOG_TO_CONSOLE)

Finally, I'll set up some Postgres connectors: 

In [3]:
# Create the connection string to the database
postgres_connection_string = f"postgresql://{POSTGRES_USER}:{POSTGRES_PASSWORD}@{POSTGRES_HOST}:{POSTGRES_PORT}/{POSTGRES_DB}"

# Create the connection engine
engine = create_engine(postgres_connection_string)
metadata = MetaData()
session = sessionmaker(bind=engine)()
Base = declarative_base()

# **Query Experimentation**
Below, I've collected some of my experiments with writing the API queries.

### Searching for Similar Embeddings
The crux of this project: embedding arbitrary text, and then finding the most similar embeddings to that text.

In [4]:
# First: define the query, and then embed it
query_text = "Shredding guitar, heavy drums, and fast-paced vocals"
query_embedding = embed_text(query_text)

# Now, get the most similar embeddings
most_similar_embeddings_df = pg_queries.most_similar_embeddings(query_embedding, engine, n=5)

# Show this DataFrame
most_similar_embeddings_df

Unnamed: 0,id,url,embedding_type,start_segment,end_segment,segment_length,embedding,cos_sim
0,AQzJxUAq7Is_16_20,https://www.youtube.com/watch?v=AQzJxUAq7Is,segment_chunk,16,20,4,"[0.0031118393,0.027207196,-0.0075575546,-0.005...",0.067297
1,AQzJxUAq7Is_4_8,https://www.youtube.com/watch?v=AQzJxUAq7Is,segment_chunk,4,8,4,"[0.0103615215,0.061028168,-0.029346589,-0.0213...",0.057054
2,AQzJxUAq7Is_24_32,https://www.youtube.com/watch?v=AQzJxUAq7Is,segment_chunk,24,32,8,"[-0.007932259,0.047170874,-0.026811881,-0.0116...",0.035351
3,AQzJxUAq7Is_0_8,https://www.youtube.com/watch?v=AQzJxUAq7Is,segment_chunk,0,8,8,"[0.010563213,0.023397792,-0.022741012,-0.01457...",0.023266
4,ApMeFBLy3v0_92_96,https://www.youtube.com/watch?v=ApMeFBLy3v0,segment_chunk,92,96,4,"[0.024722725,0.034520008,-0.032080524,-0.01482...",0.018605


### Retrieving Video Metadata
In order to display some information about a video, I'll need a general method to search for a bunch of their data. Should probably allow for the fetching of various videos' data, too. 

In [5]:
# Determine the IDs of the song we want video metadata for
song_ids = ["uCX9A3xROQo"]

# Get the video metadata for these songs
video_metadata_df = pg_queries.retrieve_multiple_video_metadata(["uCX9A3xROQo"], engine)

video_metadata_df

Unnamed: 0,id,title,length,channel_id,channel_name,short_description,description,view_ct,url,small_thumbnail_url,large_thumbnail_url,video_type,review_score,publish_date,scrape_date


### Retrieving a Video's Transcript
Another method will be retrieving a video's entire transcript!

In [6]:
# Determining the ID of the video we want the transcript for
video_id = "uCX9A3xROQo"

# Query for the entire transcript
video_transcript_df = pg_queries.retrieve_multiple_video_transcripts(["uCX9A3xROQo"], engine)

video_transcript_df

Unnamed: 0,url,text,segment_id,segment_seek,segment_start,segment_end,video_id


### Searching for Similar Embeddings (Filtered Options)
Below, I'm going to write a method to search for similar embeddings (over a filtered set of videos).


In [8]:
# Parameterize the search
# release_date_filter = [datetime.datetime(2023, 1, 1), datetime.datetime(2024, 6, 1)]
# video_type_filter = ["album_review", "mixtape_review"]
# review_score_filter = None

# Run the most_similar_embeddings_filtered from the postgres_queries module
most_similar_embeddings_filtered_df = (
    pg_queries.most_similar_embeddings_to_text_filtered(
        text="tiny string piano embellishment",
        engine=engine,
        n=300,
        # release_date_filter=release_date_filter,
        # video_type_filter=video_type_filter,
        # review_score_filter=review_score_filter,
        include_text=True,
    )
)

# Show the DataFrame
most_similar_embeddings_filtered_df

Unnamed: 0,id,url,embedding_type,start_segment,end_segment,segment_length,embedding,cos_sim,text
0,dCdMYNbK8Vs_48_52,https://www.youtube.com/watch?v=dCdMYNbK8Vs,segment_chunk,48,52,4,"[-0.0062035797,-0.0024234902,0.0007060562,-0.0...",0.546421,"title thanks to a grand upward, mobile musical..."
1,dCdMYNbK8Vs_40_48,https://www.youtube.com/watch?v=dCdMYNbK8Vs,segment_chunk,40,48,8,"[-0.009083154,0.012684595,-0.005445553,-0.0252...",0.477517,tiny string and piano embellishments orbit aro...
2,cS0bI-chYN8_32_36,https://www.youtube.com/watch?v=cS0bI-chYN8,segment_chunk,32,36,4,"[0.011554193,0.013490616,-0.009588517,-0.00783...",0.411181,"Benjamin's vocal delivery is stunning, and the..."
3,CEpKouCO6L8_64_72,https://www.youtube.com/watch?v=CEpKouCO6L8,segment_chunk,64,72,8,"[0.01007167,0.046383973,0.0017122536,-0.037335...",0.404509,"again dramatic, our paginated guitar chords. A..."
4,CEpKouCO6L8_32_36,https://www.youtube.com/watch?v=CEpKouCO6L8,segment_chunk,32,36,4,"[0.026274744,0.031433225,-0.0072821644,-0.0244...",0.396319,The band starts with nothing but feverish riff...
...,...,...,...,...,...,...,...,...,...
295,c5j8_b8aFnw_64_72,https://www.youtube.com/watch?v=c5j8_b8aFnw,segment_chunk,64,72,8,"[0.013763807,-0.005630357,-0.024416354,-0.0368...",0.244556,with just a little bit more character and dyna...
296,AOWmL1eydWI_16_24,https://www.youtube.com/watch?v=AOWmL1eydWI,segment_chunk,16,24,8,"[0.012059503,0.03208188,0.012448288,-0.0225206...",0.244307,Kinda sounds like both of them are struggling ...
297,CUNeMprEkDA_16_24,https://www.youtube.com/watch?v=CUNeMprEkDA,segment_chunk,16,24,8,"[0.06327926,0.041028872,-0.004717264,0.0121969...",0.244013,"In between songs and musical movements, Mick's..."
298,cUXCNgJSmzU_40_48,https://www.youtube.com/watch?v=cUXCNgJSmzU,segment_chunk,40,48,8,"[0.013907309,0.027188161,-0.042376224,-0.00639...",0.243964,Just kind of irrelevant if you're kind of a vo...
