# **Writing Postgres Queries**
In order to write the API, I'll need a couple of different Postgres queries. I'll test them out throughout this notebook! 

# Setup
The cells below will set up the rest of the notebook.

I'll start by configuring the kernel: 

In [1]:
# Change the working directory 
%cd ..

# Enable the autoreload extension, which will automatically load in new code as it's written
%load_ext autoreload
%autoreload 2

d:\data\programming\neural-needledrop\api


Now I'll import some necessary modules:

In [2]:
# General import statements
from sqlalchemy import create_engine, MetaData
from sqlalchemy.orm import sessionmaker, declarative_base
import datetime
import pandas as pd

# Importing custom modules
from utils.openai import embed_text
from utils.settings import (
    POSTGRES_USER,
    POSTGRES_PASSWORD,
    POSTGRES_HOST,
    POSTGRES_PORT,
    POSTGRES_DB,
    LOG_TO_CONSOLE,
)
from utils.logging import get_logger
from utils.postgres import query_postgres
import utils.postgres_queries as pg_queries

# Set up a logger for this notebook
logger = get_logger("postgres_notebook", log_to_console=LOG_TO_CONSOLE)

Finally, I'll set up some Postgres connectors: 

In [3]:
# Create the connection string to the database
postgres_connection_string = f"postgresql://{POSTGRES_USER}:{POSTGRES_PASSWORD}@{POSTGRES_HOST}:{POSTGRES_PORT}/{POSTGRES_DB}"

# Create the connection engine
engine = create_engine(postgres_connection_string)
metadata = MetaData()
session = sessionmaker(bind=engine)()
Base = declarative_base()

# **Query Experimentation**
Below, I've collected some of my experiments with writing the API queries.

### Searching for Similar Embeddings
The crux of this project: embedding arbitrary text, and then finding the most similar embeddings to that text.

In [4]:
# First: define the query, and then embed it
query_text = "Shredding guitar, heavy drums, and fast-paced vocals"
query_embedding = embed_text(query_text)

# Now, get the most similar embeddings
most_similar_embeddings_df = pg_queries.most_similar_embeddings(query_embedding, engine, n=5)

# Show this DataFrame
most_similar_embeddings_df

Unnamed: 0,id,url,embedding_type,start_segment,end_segment,segment_length,embedding,cos_sim
0,Zn3F8mZSrCk_4_8,https://www.youtube.com/watch?v=Zn3F8mZSrCk,segment_chunk,4,8,4,"[0.0030553022,0.03946543,-0.0013984011,0.01894...",1.0
1,Zn3F8mZSrCk_8_12,https://www.youtube.com/watch?v=Zn3F8mZSrCk,segment_chunk,8,12,4,"[0.0030553022,0.03946543,-0.0013984011,0.01894...",1.0
2,Zn3F8mZSrCk_12_16,https://www.youtube.com/watch?v=Zn3F8mZSrCk,segment_chunk,12,16,4,"[0.0030553022,0.03946543,-0.0013984011,0.01894...",1.0
3,Zn3F8mZSrCk_16_20,https://www.youtube.com/watch?v=Zn3F8mZSrCk,segment_chunk,16,20,4,"[0.0030553022,0.03946543,-0.0013984011,0.01894...",1.0
4,Zn3F8mZSrCk_0_4,https://www.youtube.com/watch?v=Zn3F8mZSrCk,segment_chunk,0,4,4,"[0.0030553022,0.03946543,-0.0013984011,0.01894...",1.0


### Retrieving Video Metadata
In order to display some information about a video, I'll need a general method to search for a bunch of their data. Should probably allow for the fetching of various videos' data, too. 

In [5]:
# Determine the IDs of the song we want video metadata for
song_ids = ["uCX9A3xROQo"]

# Get the video metadata for these songs
video_metadata_df = pg_queries.retrieve_multiple_video_metadata(["uCX9A3xROQo"], engine)

video_metadata_df

Unnamed: 0,id,title,length,channel_id,channel_name,short_description,description,view_ct,url,small_thumbnail_url,large_thumbnail_url,video_type,review_score,publish_date,scrape_date
0,uCX9A3xROQo,Armand Hammer - We Buy Diabetic Test Strips AL...,512,UCt7fwAhXDy3oNFTAzF2o8Pw,theneedledrop,Listen: https://armandhammer.bandcamp.com/albu...,Listen: https://armandhammer.bandcamp.com/albu...,154042,https://www.youtube.com/watch?v=uCX9A3xROQo,https://i.ytimg.com/vi/uCX9A3xROQo/default.jpg,https://i.ytimg.com/vi/uCX9A3xROQo/sddefault.jpg,album_review,9,2023-10-05,2024-01-06 00:52:26.235028


### Retrieving a Video's Transcript
Another method will be retrieving a video's entire transcript!

In [6]:
# Determining the ID of the video we want the transcript for
video_id = "uCX9A3xROQo"

# Query for the entire transcript
video_transcript_df = pg_queries.retrieve_multiple_video_transcripts(["uCX9A3xROQo"], engine)

video_transcript_df

Unnamed: 0,url,text,segment_id,segment_seek,segment_start,segment_end,video_id
0,https://www.youtube.com/watch?v=uCX9A3xROQo,"Hi everyone, Speaker of the House here, the I...",0,0,0,9,uCX9A3xROQo
1,https://www.youtube.com/watch?v=uCX9A3xROQo,We by diabetic test strips.,1,0,9,12,uCX9A3xROQo
2,https://www.youtube.com/watch?v=uCX9A3xROQo,"Okay, for how much?",2,0,12,14,uCX9A3xROQo
3,https://www.youtube.com/watch?v=uCX9A3xROQo,"Uh, zero dollars.",3,0,14,16,uCX9A3xROQo
4,https://www.youtube.com/watch?v=uCX9A3xROQo,It's just a title.,4,0,16,17,uCX9A3xROQo
...,...,...,...,...,...,...,...
137,https://www.youtube.com/watch?v=uCX9A3xROQo,Hit the like if you like.,137,49042,501,502,uCX9A3xROQo
138,https://www.youtube.com/watch?v=uCX9A3xROQo,Please subscribe and please don't cry.,138,49042,502,503,uCX9A3xROQo
139,https://www.youtube.com/watch?v=uCX9A3xROQo,Hit the bell as well over here next to my hea...,139,49042,503,507,uCX9A3xROQo
140,https://www.youtube.com/watch?v=uCX9A3xROQo,Hit that up with a link to subscribe to the c...,140,49042,507,509,uCX9A3xROQo


### Searching for Similar Embeddings (Filtered Options)
Below, I'm going to write a method to search for similar embeddings (over a filtered set of videos).


In [7]:
# Parameterize the search
release_date_filter = [datetime.datetime(2023, 1, 1), datetime.datetime(2024, 6, 1)]
video_type_filter = ["album_review", "mixtape_review"]
review_score_filter = None

# Run the most_similar_embeddings_filtered from the postgres_queries module
most_similar_embeddings_filtered_df = (
    pg_queries.most_similar_embeddings_to_text_filtered(
        text="gentle swimming underwater love",
        engine=engine,
        n=100,
        release_date_filter=release_date_filter,
        video_type_filter=video_type_filter,
        review_score_filter=review_score_filter,
        include_text=True,
    )
)

# Show the DataFrame
most_similar_embeddings_filtered_df

Unnamed: 0,id,url,embedding_type,start_segment,end_segment,segment_length,embedding,cos_sim,text
0,x1QKEcFSYZw_56_64,https://www.youtube.com/watch?v=x1QKEcFSYZw,segment_chunk,56,64,8,"[0.0030553022,0.03946543,-0.0013984011,0.01894...",1.0,"Very bluntly, I get the fuck out of my life. O..."
1,Zn3F8mZSrCk_0_4,https://www.youtube.com/watch?v=Zn3F8mZSrCk,segment_chunk,0,4,4,"[0.0030553022,0.03946543,-0.0013984011,0.01894...",1.0,"Oh! Hi, everyone. Big Thin He dropped Hano her..."
2,x1QKEcFSYZw_48_56,https://www.youtube.com/watch?v=x1QKEcFSYZw,segment_chunk,48,56,8,"[0.0030553022,0.03946543,-0.0013984011,0.01894...",1.0,Without any crazy frills or window dressing or...
3,Zn3F8mZSrCk_16_20,https://www.youtube.com/watch?v=Zn3F8mZSrCk,segment_chunk,16,20,4,"[0.0030553022,0.03946543,-0.0013984011,0.01894...",1.0,first release with them was an album of music ...
4,Zn3F8mZSrCk_8_12,https://www.youtube.com/watch?v=Zn3F8mZSrCk,segment_chunk,8,12,4,"[0.0030553022,0.03946543,-0.0013984011,0.01894...",1.0,running especially considering that he played ...
...,...,...,...,...,...,...,...,...,...
95,x1QKEcFSYZw_16_24,https://www.youtube.com/watch?v=x1QKEcFSYZw,segment_chunk,16,24,8,"[0.0030553022,0.03946543,-0.0013984011,0.01894...",1.0,Because while Fiona Apple has been nothing of ...
96,x1QKEcFSYZw_24_32,https://www.youtube.com/watch?v=x1QKEcFSYZw,segment_chunk,24,32,8,"[0.0030553022,0.03946543,-0.0013984011,0.01894...",1.0,And she's so fatalistic about it to in her wri...
97,x1QKEcFSYZw_32_40,https://www.youtube.com/watch?v=x1QKEcFSYZw,segment_chunk,32,40,8,"[0.0030553022,0.03946543,-0.0013984011,0.01894...",1.0,And the cherry on top is letting the drummer g...
98,x1QKEcFSYZw_40_48,https://www.youtube.com/watch?v=x1QKEcFSYZw,segment_chunk,40,48,8,"[0.0030553022,0.03946543,-0.0013984011,0.01894...",1.0,Where she sings about wanting to make a mistak...
