# Damax 🚀
## Local LLM to chat with your videos and Audios
Ever felt overwhelmed by endless hours of podcasts, audiobooks, or videos? Imagine a podcast that’s 2 hours long or an audiobook that stretches to 40 hours! Do you really need to listen to every single word? Or spend 20 minutes on a product review just to know if it’s good or bad?

Not anymore! With Damax, you can focus on what truly matters. Get straight to the core information that adds value to your knowledge.

Damax lets you chat or talk with your video or audio, ask anything about it, get summaries, and more. Plus, you can compress audio to a shorter version, making it faster and easier to consume.

Save time, stay informed, and enjoy the essentials with Damax! 🚀

![image info](logos/1.jpg)

In [1]:
import whisper
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain_community.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain_community.llms import Ollama
from langchain.chains import RetrievalQA
from langchain import PromptTemplate
from bark import SAMPLE_RATE, generate_audio, preload_models
from scipy.io.wavfile import write as write_wav
from IPython.display import Audio
from gtts import gTTS
from pydub import AudioSegment
from pydub.playback import play
import speech_recognition as sr
import os
from moviepy.editor import VideoFileClip
from IPython.display import Audio, display
import warnings
import platform
warnings.filterwarnings('ignore')

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
print('Enter the full path for your file:')
user_input = input()

Enter the full path for your file:


 video.mp4


In [3]:
basename = os.path.basename(user_input)
dirname = os.path.dirname(user_input)
basename_without_ext = os.path.splitext(os.path.basename(user_input))[0]

In [4]:
audio_path = "audio/"+basename_without_ext+".mp3"

In [5]:
try:
    if "mp4" in basename:
        if not dirname:
            video = VideoFileClip("video/"+basename)
        else:
            video = VideoFileClip(user_input)
        video.audio.write_audiofile(audio_path)
except:
    print("Cannot find the file or pro!")

MoviePy - Writing audio in audio/video.mp3


                                                                                                                                                                                            

MoviePy - Done.




In [6]:
# Load the Whisper model
model = whisper.load_model("base")

# Transcribe the audio file
transcript = model.transcribe(audio=audio_path, word_timestamps=True)

# Print the transcription with timestamps
for segment in transcript['segments']:
    start_minutes, start_seconds = divmod(segment["start"], 60)
    timestamp = f"{int(start_minutes):02d}:{int(start_seconds):02d}"
    text = segment["text"]
    segment["timestamp"] = timestamp
    print(f"{timestamp} - {text}")

00:00 -  A person who thinks all the time has nothing to think about except thoughts.
00:09 -  So he loses touch with reality and lives in a world of illusions.
00:18 -  By thoughts I mean specifically, chatter in the skull, perpetual and compulsive repetition
00:25 -  of words, of reckoning and calculating.
00:32 -  I'm not saying that thinking is bad.
00:36 -  Like everything else is useful in moderation.
00:40 -  A good servant but a bad master.
00:43 -  And all so-called civilized peoples have increasingly become crazy and self-destructive
00:50 -  because through excessive thinking they have lost touch with reality.
00:56 -  That's to say, we confuse signs, words, numbers, symbols and ideas with the real world.
01:07 -  Most of us would have rather money than tangible wealth.
01:13 -  And a great occasion is somehow spoiled for us unless photographed.
01:19 -  And to read about it the next day in the newspaper is oddly more fun for us than the original event.
01:30 -  This is a di

In [7]:
# Combine all segment texts into one string
transcription_text = " ".join(segment["timestamp"]+": "+segment["text"] for segment in transcript['segments'])
transcription_text

"00:00:  A person who thinks all the time has nothing to think about except thoughts. 00:09:  So he loses touch with reality and lives in a world of illusions. 00:18:  By thoughts I mean specifically, chatter in the skull, perpetual and compulsive repetition 00:25:  of words, of reckoning and calculating. 00:32:  I'm not saying that thinking is bad. 00:36:  Like everything else is useful in moderation. 00:40:  A good servant but a bad master. 00:43:  And all so-called civilized peoples have increasingly become crazy and self-destructive 00:50:  because through excessive thinking they have lost touch with reality. 00:56:  That's to say, we confuse signs, words, numbers, symbols and ideas with the real world. 01:07:  Most of us would have rather money than tangible wealth. 01:13:  And a great occasion is somehow spoiled for us unless photographed. 01:19:  And to read about it the next day in the newspaper is oddly more fun for us than the original event. 01:30:  This is a disaster. 01:34

In [8]:
with open("text/output.txt", "w") as text_file:
    text_file.write(transcription_text)

# Load the transcribed text
loader = TextLoader("text/output.txt", encoding='UTF-8')
docs = loader.load()

# Split the text into chunks
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=7000,
    chunk_overlap=500
)
splits = text_splitter.split_documents(docs)

In [9]:
# Create the open-source embedding function
embedding_function = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")

# Initialize Chroma for vector storage and load it with the document splits
db = Chroma.from_documents(splits, embedding_function)
retriever = db.as_retriever()

# Initialize the LLaMA 3 LLM from Ollama
ollama = Ollama(base_url='http://localhost:11434', model="llama3")

In [10]:
# Define the prompt template
template = """Use the following pieces of context to answer the question at the end, the context could represent audio, video, meeting or audiobook,
the context is annotated with timestamps, the answer should only mention the time of user asked about it.
If you don’t know the answer, just say that you don’t know, don’t try to make up an answer.
Use three sentences maximum and keep the answer as concise as possible.
{context}
Question: {question}
Helpful Answer:"""
QA_CHAIN_PROMPT = PromptTemplate(input_variables=["context", "question"], template=template)

# Create a RetrievalQA chain
qachain = RetrievalQA.from_chain_type(llm=ollama, retriever=retriever, chain_type="stuff")

In [11]:
def gen_answer(question):
    print("Could you give me a moment? I'm thinking..")
    res = qachain.invoke({"query": question})
    print(res['result'])
    return res['result']

In [12]:
def gen_audio(answer):
    print("Let me generate an audio for the answer, so you could listen to it ^_^")
    audio_array = generate_audio(answer)
    display(Audio(audio_array, rate=SAMPLE_RATE))
    # save audio to disk
    write_wav("gen_audio/bark_generation.wav", SAMPLE_RATE, audio_array)

In [13]:
def text_to_speech(text):
    # Create a gTTS object
    tts = gTTS(text=text, lang='en')

    # Save the audio file
    audio_file = "output.mp3"
    tts.save(audio_file)
    if platform.system() == 'Linux' and 'microsoft' in platform.uname().release.lower():
        # Play the audio file using ffplay
        os.system(f'ffplay -autoexit -nodisp {audio_file}')
    else: 
        # Load and play the audio file
        audio = AudioSegment.from_mp3(audio_file)
        play(audio)
    # Clean up
    os.remove(audio_file)

In [14]:
def speech_recognition():
    # Initialize the recognizer
    recognizer = sr.Recognizer()
    
    # Use the microphone as the source of audio
    with sr.Microphone() as source:
        print("Adjusting for ambient noise. Please wait...")
        # Adjusts the recognizer sensitivity to ambient noise
        recognizer.adjust_for_ambient_noise(source)
        
        print("Listening...")
        # Capture audio from the microphone
        audio = recognizer.listen(source)
        
        try:
            # Recognize the speech using Google's speech recognition
            print("Recognizing...")
            text = recognizer.recognize_google(audio)
            print("You said: " + text)
        
        except sr.UnknownValueError:
            # The speech was unintelligible
            print("Sorry, I could not understand the audio.")
        
        except sr.RequestError:
            # The API request failed
            print("Sorry, there was an issue with the request.")

In [15]:
# Ask a question
answer = gen_answer("What is the main topic?")
text_to_speech(answer)
gen_audio(answer)

Number of requested results 4 is greater than number of elements in index 1, updating n_results = 1


Could you give me a moment? I'm thinking..
The main topic of this passage appears to be the concept of reality and how it relates to thinking and human perception. The speaker seems to be criticizing the tendency for people to become overly consumed with abstract concepts, such as thoughts and ideas, at the expense of connecting with the physical world around them. They argue that excessive thinking can lead to a disconnection from reality, causing problems in our personal and collective lives.
5.15.146.1-microsoft-standard-wsl2


ffplay version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2003-2021 the FFmpeg developers
  built with gcc 11 (Ubuntu 11.2.0-19ubuntu1)
  configuration: --prefix=/usr --extra-version=0ubuntu0.22.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enab


Let me generate an audio for the answer, so you could listen to it ^_^


100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:12<00:00,  7.90it/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 35/35 [00:52<00:00,  1.49s/it]


In [16]:
# Ask a question
answer = gen_answer("Summarise")
text_to_speech(answer)
gen_audio(answer)

Number of requested results 4 is greater than number of elements in index 1, updating n_results = 1


Could you give me a moment? I'm thinking..
Based on the provided context, it appears that the speaker is lamenting the effects of excessive thinking and mental abstraction on one's relationship with reality. The speaker suggests that when we think too much, we lose touch with the real world and become disconnected from nature.

The speaker also argues that many people prioritize intangible things like money or symbols over tangible wealth, and that even experiences are often reduced to mere signs (such as photographs) rather than being fully engaged with in the moment.

5.15.146.1-microsoft-standard-wsl2


ffplay version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2003-2021 the FFmpeg developers
  built with gcc 11 (Ubuntu 11.2.0-19ubuntu1)
  configuration: --prefix=/usr --extra-version=0ubuntu0.22.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enab


Let me generate an audio for the answer, so you could listen to it ^_^


100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:09<00:00, 10.64it/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 35/35 [00:47<00:00,  1.35s/it]


In [17]:
# Ask a question
answer = gen_answer("when the sentence Reality is this have been mentioned?")
text_to_speech(answer)
gen_audio(answer)

Number of requested results 4 is greater than number of elements in index 1, updating n_results = 1


Could you give me a moment? I'm thinking..
According to the provided context, the sentence "Reality is this" is mentioned at time 02:17.
5.15.146.1-microsoft-standard-wsl2


ffplay version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2003-2021 the FFmpeg developers
  built with gcc 11 (Ubuntu 11.2.0-19ubuntu1)
  configuration: --prefix=/usr --extra-version=0ubuntu0.22.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enab


Let me generate an audio for the answer, so you could listen to it ^_^


100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:06<00:00, 14.99it/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 17/17 [00:25<00:00,  1.53s/it]


![image info](logos/2.jpg)
![image info](logos/3.jpg)
![image info](logos/4.jpg)