# Connecting Ollama using Lanchain & Clean transcript

This Jupyter notebook demonstrates how to leverage Ollama, a powerful local large language model, in conjunction with Langchain to clean and process transcription data. Here, we assume that you run model on Ollama in the background i.e. `ollama serve`

In [None]:
%%capture
!pip install langchain
!pip install langchain_community

In [None]:
import io
import pandas as pd
from langchain_community.llms import Ollama
from langchain import PromptTemplate # Added

llm = Ollama(model="llama3.1", stop=["<|eot_id|>"]) # Added stop token

def get_model_response(user_prompt, system_prompt):
    # NOTE: No f string and no whitespace in curly braces
    template = """
        <|begin_of_text|>
        <|start_header_id|>system<|end_header_id|>
        {system_prompt}
        <|eot_id|>
        <|start_header_id|>user<|end_header_id|>
        {user_prompt}
        <|eot_id|>
        <|start_header_id|>assistant<|end_header_id|>
        """

    # Added prompt template
    prompt = PromptTemplate(
        input_variables=["system_prompt", "user_prompt"],
        template=template
    )
    
    # Modified invoking the model
    response = llm(prompt.format(system_prompt=system_prompt, user_prompt=user_prompt))
    
    return response

### TODOs: Write a prompt to summarize the transcribed text

- Reference: https://www.reddit.com/r/ChatGPT/comments/11twe7z/prompt_to_summarize/

In [None]:
# Read CSV to pandas, process transcription, get the cleaned transcript --> You may need to chunk transcripts
# Use notebook 01 to transcribe the text from Youtube and then read it here
transcripts = pd.read_csv("content.csv")

text = " ".join(transcripts.text)
user_prompt = f"<Your prompt>: {text}" # TODO
system_prompt = "You are a helpful assistant doing as the given prompt." # TODO
print(get_model_response(user_prompt, system_prompt))