### Prompt

In [1]:
from langchain.prompts import ChatPromptTemplate
from IPython.display import Markdown, display

template = """
Answer the question based on the context below. If you can't 
answer the question, reply "I don't know". Return result with Markdown formatting.

Context: {context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

### Model (llama3)

In [2]:
from langchain_community.llms import Ollama # type: ignore
llm = Ollama(model="llama3")

### Str Parser

In [3]:
from langchain_core.output_parsers import StrOutputParser
parser = StrOutputParser()

In [4]:
# chain = prompt | llm | parser
# result = chain.invoke({"context": "Peter loves Mary. Mary loves Silvia. Silvia loves Peter. Mary doesn't love Peter",
#                         "question": "what is 1 + 1"})
# display(Markdown(result))

In [5]:
from langchain_community.embeddings import OllamaEmbeddings
from langchain_chroma import Chroma

embeddings = OllamaEmbeddings(model="nomic-embed-text")
chroma = Chroma(persist_directory="./chroma_db", embedding_function=embeddings)
chroma.similarity_search("Why LLMs are so popular?")

[Document(page_content='A Comprehensive Overview from Training to Inference\n4.2. Memory Scheduling\nDeployingLLMsonasingleconsumer-gradeGPUisconstrainedbythelimitationsoftheavailablevideomemory,\ngiventhesubstantialparametersofLLMs.Therefore,appropriateMemorySchedulingstrategiescanbeusedtosolve\nthe hardware limitations of large model inference. Memory scheduling in large model inference involves the efficient\norganization and management of memory access patterns during the reasoning or inference phase of complex neural\nnetwork models. In the context of sophisticated reasoning tasks, such as natural language understanding or complex\ndecision-making, large models often have intricate architectures and considerable memory requirements. Memory\nschedulingoptimizestheretrievalandstorageofintermediaterepresentations,modelparameters,andactivationvalues,\nensuring that the inference process is both accurate and performed with minimal latency. For example, BMInf [184]\nutilizes the princip

### Final chain

In [6]:
from langchain_core.runnables import RunnablePassthrough
chain = (
    {"context": chroma.as_retriever(), "question": RunnablePassthrough()}
    | prompt
    | llm
    | parser
  
)


### Inference

In [7]:
display(Markdown(chain.invoke("Why LLMs are so popular?")))

Based on the context, it seems that Large Language Models (LLMs) are gaining popularity due to their ability to continue expanding and augmenting their learning capabilities, thereby increasing their overall performance. Additionally, the majority of currently available LLMs are confined to a single natural language modality, which is expected to evolve towards handling multimodal data like images and audio in the future. This evolution would enable models to comprehensively understand and generate multimodal content, significantly broadening the applications scope of LLMs.

Furthermore, researchers in the field of AI working on LLM development are recognizing the importance of collaboration with professionals from diverse fields to address the challenges inherent in developing these complex models.

In [9]:
display(Markdown(chain.invoke("Describe what is a transformer?")))

Based on the context provided, a Transformer is a type of neural network architecture that allows for parallel processing and self-attention mechanisms to handle complex patterns in sequential data such as text.

In particular, it calculates a weighted sum of all words in a sentence where the weights are determined by the relevance of each word to the target word. This allows the model to weigh the importance of different words in a sentence when predicting a particular word.

### Testing wrong question

In [8]:

result = chain.invoke("When first astronauts landed on the moon?")
display(Markdown(result))

I'm not able to answer your question based on the context provided. It seems that the document is related to large language models and their applications in various fields, but it does not contain information about the first astronauts landing on the moon.

The document appears to be a collection of research papers and preprints on topics such as natural language processing, mathematical problem-solving, and question answering. If you're looking for information about the first astronauts to land on the moon, I'd be happy to help you with that!