## Initialize libraries, setup Config file and retrieve transcripts

In [1]:
from helper import *

config = Config()
config.setFromEnv()

videoNames = [
    "New Quizzes Video",
    "Rearrange Playlist video",
    "IMSE 514 Presentation",
    "New Google Assignments in Canvas",
    "Piazza Introduction Workshop",
]

videoToUse = "New Google Assignments in Canvas"
transcriptToUse = getCombinedTranscripts(
    videoToUse,
    captionsFolder=config.captionsFolder,
)

## Run BERTTopic to get Topics over Time

### Choice of Representation Model
3 choices of  Representation Models that can be used:

    'simple': Uses BERTopic alone, takes -40 seconds to run, but has no token cost

The next two models leverage GenAI Large Language Models (LLMs) to take extracted keywords in a topic and form a more human-interpretable topic name. This will also incur an associated token cost for using the LLM.

    'openai': Uses BERTopic and OpenAI to generate interpretable topic titles, takes ~6 minutes to run
    'langchain': Uses BERTopic and LangChain to generate interpretable topic titles, takes ~3 minutes to run


The default is `langchain`, as it is faster and seems to perform better regarding the titles generated. 
It also has an easier to modify prompt if you want some flexibility there.

This can be changed by adjusting the `representationModelType` variable in the config file.


### Using KeyBERT
[KeyBERT is a recommended library](https://maartengr.github.io/BERTopic/getting_started/tips_and_tricks/tips_and_tricks.html#keybert-bertopic) to use when setting up the `BERTopic` model. 

It helps with extracting pertinent keywords in a document, and should help produce topics with more relevant keywords attached. 

Defaults to `True`, and can be adjust via the `useKeyBERT` vaitable in the config file.

In [2]:
topicsOverTime, topicModel = retrieveTopics(videoToUse, transcriptToUse, config=config)

### Visualize topics over time

In [3]:
topicModel.visualize_topics_over_time(topicsOverTime)