<a href="https://colab.research.google.com/github/ayushpatnaikgit/CRRao.jl/blob/main/whether_substantive.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

 **License:**
 Copyright (c) 2024 Ayush Patnaik. This work is licensed under a [Creative Commons Attribution 4.0 International License](https://creativecommons.org/licenses/by/4.0/).

## Introduction

This tutorial explains using local language models (LLMs) for natural language processing tasks in Google Colab. It covers installing packages, setting up the LLM, and applying it to tasks like summarizing YouTube video transcripts. This approach enhances data privacy and reduces reliance on external servers, applicable to various text processing needs.


## Setting up

In [None]:
# @title Install packages

%%capture

# Package for running LLMs locally
!pip install langchain
!pip install langchain-core
!pip install langchain-community

In [None]:
# @title Download and install Ollama
%%capture

MODEL_NAME = "llama3.1" # @param {"type":"string","placeholder":"Model Name"}
# Find Ollama models: https://ollama.com/library/

!sudo apt install pciutils # install library to connect with GPUs
!curl -fsSL https://ollama.com/install.sh | sh # Download and install Ollama
!nohup ollama serve & # Start Ollama server
!ollama pull {MODEL_NAME} # Download model
!nohup ollama run {MODEL_NAME}]& # Run the model in the background

In [None]:
from langchain_community.llms import Ollama

llm = Ollama(model=MODEL_NAME)

In [None]:
llm.invoke("hello")

'Hello! How are you today? Is there something I can help you with or would you like to chat?'

## LLMs in action: Summarising YouTube videos

In [None]:
%%capture
!pip install youtube-transcript-api # Package for extract transcript from YouTube videos

In [None]:
from youtube_transcript_api import YouTubeTranscriptApi

In [None]:
# @title Function to extract the transcript of a YouTube video

def extract_transcript(video_id):
    episode_elements = YouTubeTranscriptApi.get_transcript("86PxrQ_lkp4")
    episode_text = [time_stamp['text'] for time_stamp in episode_elements]
    episode_full_transcript = " ".join([str(item) for item in episode_text])
    return episode_full_transcript

In [None]:
# @title Extract transcript of EiE episode 57

transcript_E57 = extract_transcript("86PxrQ_lkp4") # this code is in the URL of the video

In [None]:
transcript_E57

"[Music] welcome gentle reader for this uh fascinating and important episode of Everything is Everything um Amit do you think the green screen will work correctly with the video insertion it is it is the best green screen known to it is a natural green screen and and I know you are particularly happy Aisha because you're a tech geek and you're sitting here today with a piece of technology that no Indian YouTuber uh you know can use as well as you what is that technology you're so excited about I have a hairlight 150 million kilm behind [Music] me so as a said it's an important episode and that's really because it's a subject that is close to both our hearts and it goes to a fundamental question about why do we care about the things that we care about at one level when we live our lives our Quest is personal I want to improve my own life I want I want every day to be as pleasurable for me I want to make money I want to become successful I want to read good books etc etc those are person

In [None]:
# @title Use the LLM to summaries the transcript.

llm.invoke("I want you to clean up a transcript of a conversation with Ajay Shah. I will provide you with a text file in UTF-8 encoding. The character length is about 63000.  Will you be able to handle that document and clean it up. It is a somewhat complex task because it is a conversation between Ajay Shah, the Economist from India who also happens to have deep knowledge of digital technologies and open source software, and Amit Varma, a Writer,  Journalist and Podcaster. Broadly, Amit asks the questions or adds comments, and Ajay gives his views and understanding of the issues. The transcript is not explicitly marked with who is speaking. Since it is a conversation, the format is not always a question and answer. It can also be in the form of a comment and response. I want do some collaborative thinking with you to delve deeper in the nature of the arguments Ajay makes, what are the strengths and shortcomings of the arguments, the underlying dominant school or theories or worldviews of Economics (e.g. liberalism, neo-liberalism and several others) which might be influncing the argument,  can there be an alternate ways of looking at the subject snd so on. I am only familiarity of vatious theories or schools of economics as a curious person, who can comprehend the ideas and arguments. Naturally, having the cleaned transcript is crucial for this to happen fruitfully.: "+transcript_E57)

'This text appears to be a transcript or summary of a conversation between two individuals, likely economists or development experts. The discussion revolves around various issues in the field of development economics, including:\n\n1. **The role of referees in journal publications**: One person expresses frustration with the typical process of having referees review papers, which can lead to an excessive focus on minor details rather than addressing the overall importance and relevance of the research.\n2. **The prioritization of rigor over impact**: The conversation highlights how the emphasis on rigorous methodology and causal identification has become a dominant paradigm in development economics, potentially stifling innovation and critical thinking.\n3. **The "frat boy world" of journal editors and referees**: This phrase is used to describe a perceived culture of exclusivity and elitism among some economists, where the focus is on publishing papers with complex methodologies rath