### chunk_into

Pick a start index and use n_chunks as stride to get a chunk, where n_chunks is the number of chunks you want to get.

In [3]:
def chunk_into(list, n_chunks):
    """Splits list into n_chunks pieces, non-contiguously."""
    for ii in range(0, n_chunks):
        yield list[ii::n_chunks]


list(chunk_into([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 3))

[[1, 4, 7, 10], [2, 5, 8], [3, 6, 9]]

### FAISS

The minimum steps to build a corpus and ask a question about it are implemented below:
1. Build a FAISS index
2. Take a question and encode it
3. Search the index for the most similar document vectors to the question vector
4. Build a prompt with the most similar documents
5. Ask ChatGPT and return its answer

In [13]:
import os
from langchain import FAISS
from langchain.embeddings import OpenAIEmbeddings

os.environ["OPENAI_API_KEY"] = "put your key here"

embedding_engine = OpenAIEmbeddings(
    model="text-embedding-ada-002",
    allowed_special="all",
)
texts = [
    """ON THE HIDDEN MYSTERY OF OCR IN
3Microsoft Research, Redmond
Large models have recently played a dominant role in natural language processing and multimodal
vision-language learning. It remains less explored about their efficacy in text-related visual tasks. We
conducted a comprehensive study of existing publicly available multimodal models, evaluating their
performance in text recognition (scene text, artistic text, handwritten text), text-based visual question
answering (document text, scene text, and bilingual text), key information extraction (receipts,
documents, and nutrition facts) and handwritten mathematical expression recognition. Our findings
reveal strengths and weaknesses in these models, which primarily rely on semantic understanding for
word recognition and exhibit inferior perception of combinations of characters without semantic. They
also display indifference towards text length and have limited capabilities in detecting fine-grained
features in images. Consequently, these results demonstrate that even the current most powerful
large multimodal models cannot match domain-specific methods in traditional text tasks and face
greater challenges in more complex tasks. Most importantly, the baseline results showcased in this
study could provide a foundational framework for the conception and assessment of innovative
strategies targeted at enhancing zero-shot multimodal techniques. Evaluation pipeline is available at
https://github.com/Yuliang-Liu/MultimodalOCR .""",
    """The Prompt Artists
This paper examines the art practices, artwork, and motivations of prolific users of the latest generation of text-to-image
models. Through interviews, observations, and a user survey, we present a sampling of the artistic styles and describe the
developed community of practice around generative AI. We find that: 1) the text prompt andthe resulting image can be
considered collectively as an art piece ( prompts as art ), and 2) prompt templates (prompts with “slots” for others to fill in with
their own words) are developed to create generative art styles . We discover that the value placed by this community on unique
outputs leads to artists seeking specialized vocabulary to produce distinctive art pieces (e.g., by reading architectural blogs to
find phrases to describe images). We also find that some artists use “glitches” in the model that can be turned into artistic
styles of their own right. From these findings, we outline specific implications for design regarding future prompting and
image editing options""",
]

metadatas = [
    {"title": "a", "source": "https://arxiv.org/abs/2305.07895"}, # source is required by QA langchain because it is used in the prompt
    {"title": "b", "source": "https://arxiv.org/abs/2305.07896"},
]

index = FAISS.from_texts(texts=texts, embedding=embedding_engine, metadatas=metadatas)
print(index.index.ntotal)
query = "What is art prompt?"
sources_and_scores = index.similarity_search_with_score(query, k=1)
sources, _ = zip(*sources_and_scores)
sources

2


(Document(page_content='The Prompt Artists\nThis paper examines the art practices, artwork, and motivations of prolific users of the latest generation of text-to-image\nmodels. Through interviews, observations, and a user survey, we present a sampling of the artistic styles and describe the\ndeveloped community of practice around generative AI. We find that: 1) the text prompt andthe resulting image can be\nconsidered collectively as an art piece ( prompts as art ), and 2) prompt templates (prompts with “slots” for others to fill in with\ntheir own words) are developed to create generative art styles . We discover that the value placed by this community on unique\noutputs leads to artists seeking specialized vocabulary to produce distinctive art pieces (e.g., by reading architectural blogs to\nfind phrases to describe images). We also find that some artists use “glitches” in the model that can be turned into artistic\nstyles of their own right. From these findings, we outline specific 

In [14]:
from langchain.chat_models import ChatOpenAI
from langchain.chains.qa_with_sources import load_qa_with_sources_chain
import prompts

llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0, max_tokens=256)
chain = load_qa_with_sources_chain(
        llm,
        chain_type="stuff",
        verbose=False,
        prompt=prompts.main,
        document_variable_name="sources",
    )

result = chain(
        {"input_documents": sources, "question": query}, return_only_outputs=True
    )
answer = result["output_text"]
answer

'Art prompt refers to the practice of using text-to-image models to generate artwork based on a given text prompt. The resulting image and the text prompt are considered collectively as an art piece. Artists in this community develop prompt templates with "slots" for others to fill in with their own words, creating generative art styles. Artists also seek specialized vocabulary to produce distinctive art pieces and may incorporate "glitches" in the model as artistic styles. This practice has implications for future prompting and image editing options in design.\nSOURCE: https://arxiv.org/abs/2305.07896'

### Gradio

In [2]:
import gradio as gr

def greet(name):
    return "Hello " + name + "!"

demo = gr.Interface(fn=greet, inputs="text", outputs="text")
    
demo.launch(share=True) 

Running on local URL:  http://127.0.0.1:7861
Running on public URL: https://e5897120767648e855.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)




In [9]:
inputs = gr.TextArea(
    label="Question",
    value="What is zero-shot chain-of-thought prompting?",
    show_label=True,
)
outputs = gr.TextArea(
    label="Answer", value="The answer will appear here.", show_label=True
)

def qanda(query: str) -> str:
    return "hello"

interface = gr.Interface(
    fn=qanda,
    inputs=inputs,
    outputs=outputs,
    title="Ask Questions About The Full Stack.",
    description="Get answers with sources from an LLM.",
    examples=[
        "What is zero-shot chain-of-thought prompting?",
        "Would you rather fight 100 LLaMA-sized GPT-4s or 1 GPT-4-sized LLaMA?",
        "What are the differences in capabilities between GPT-3 davinci and GPT-3.5 code-davinci-002?",  # noqa: E501
        "What is PyTorch? How can I decide whether to choose it over TensorFlow?",
        "Is it cheaper to run experiments on cheap GPUs or expensive GPUs?",
        "How do I recruit an ML team?",
        "What is the best way to learn about ML?",
    ],
    allow_flagging="never",
    theme=gr.themes.Default(radius_size="none", text_size="lg"),
)
interface.launch()

Running on local URL:  http://127.0.0.1:7864

To create a public link, set `share=True` in `launch()`.


