In [1]:
with open('../../my_key.txt') as f:
    key_list = f.readlines()

key_list = [key.strip() for key in key_list]

keys = {}
for key in key_list:
    key = key.split(':')
    if (key != ['']):
        keys[key[0]] = key[1]

In [None]:
import os

os.environ['OPENAI_API_KEY'] = keys['OPENAI_API_KEY']

# Map Reduce Information Engine
- Complete the information summarization with Map Reduce
- I/O
    - Input: text to summarization
    - Output: Structured Information

## With Langchain
- Problem:
    - Change to gpt-3.5-turbo
    - Text Splitter using tiktoken
    - Need to have seperate page so that we could add 'source'

In [12]:
from langchain import OpenAI, PromptTemplate, LLMChain
from langchain.text_splitter import RecursiveCharacterTextSplitter

openai_llm = OpenAI(
    temperature = 0,
    max_tokens = 1000,
    model_name = 'text-davinci-003',
    top_p = 1
)

### Text Splitter

In [13]:
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1000,
    chunk_overlap = 20,
)

In [14]:
with open('text1.txt', 'r') as f:
    text1_processed = f.read()

with open('text2.txt', 'r') as f:
    text2_processed = f.read()


texts1 = text_splitter.split_text(text1_processed)
texts2 = text_splitter.split_text(text2_processed)

In [15]:
print(texts1[0] + '\n' + 'Come from: #url1 \n')

GPT-3 vs. BERT: Comparing the Two Most Popular Language Models
Natural language processing (NLP) has come a long way over the past few years. With the development of powerful new models such as GPT-3 and BERT, it's now possible to create sophisticated applications that can understand and interact with human language.
However, what went viral as a disruptive chatbot with ChatGPT, suddenly became a contest of language models to power AI content. So, we decided to oppose GPT-3 vs. BERT to understand their differences and similarities, explore their capabilities, and discuss some of the tools that use them.
GPT-3 (Generative Pre-trained Transformer 3) is an autoregressive language model developed by OpenAI. It was trained on a dataset of 45TB of text data from sources such as Wikipedia, books, and webpages. The model is capable of generating human-like text when given a prompt. It can also be used for tasks such as question answering, summarization, translation, and more.
Come from: #url1 

In [16]:
for i in range(len(texts1)):
    texts1[i] = texts1[i] + '\n' + 'Come from: #url1 \n'
    
for i in range(len(texts2)):
    texts2[i] = texts2[i] + '\n' + 'Come from: #url2 \n'

In [17]:
texts = texts1 + texts2

In [18]:
from langchain.docstore.document import Document

docs = [Document(page_content=t) for t in texts]

### Summarize Chain

In [19]:
from langchain.chains.summarize import load_summarize_chain

In [20]:
summarize_template = """Write a summary of the following text:
You must include what the text if from in the end of the summary, for example, [from url1] or [from url2]
{text}

Summary: 
"""

summarize_prompt = PromptTemplate(
    template = summarize_template,
    input_variables = ["text"]
)

combine_template = """Write a summary in detail of the following text:
Each bullet point should start with a title in the format of [title], and end with its source in the format of [url1, url2 ...]
Not all of the text are relevant, you should not summarize the irrelevant text.

Example:
[Make Decision] When you get the error message, you should make a decision. [url3, url6]

text:
<{text}>

Summary (in detail):
"""

combine_prompt = PromptTemplate(
    template = combine_template,
    input_variables = ["text"]
)

In [21]:
map_reduce_chain = load_summarize_chain(
    llm = openai_llm,
    chain_type = "map_reduce",
    map_prompt = summarize_prompt,
    combine_prompt = combine_prompt,
    combine_document_variable_name = "text",
    verbose = True
)

In [22]:
summary = map_reduce_chain.run(docs)



[1m> Entering new MapReduceDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mWrite a summary of the following text:
You must include what the text if from in the end of the summary, for example, [from url1] or [from url2]
GPT-3 vs. BERT: Comparing the Two Most Popular Language Models
Natural language processing (NLP) has come a long way over the past few years. With the development of powerful new models such as GPT-3 and BERT, it's now possible to create sophisticated applications that can understand and interact with human language.
However, what went viral as a disruptive chatbot with ChatGPT, suddenly became a contest of language models to power AI content. So, we decided to oppose GPT-3 vs. BERT to understand their differences and similarities, explore their capabilities, and discuss some of the tools that use them.
GPT-3 (Generative Pre-trained Transformer 3) is an autoregressive language model developed by OpenAI. It 

Retrying langchain.llms.openai.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: The server had an error while processing your request. Sorry about that!.



[1m> Finished chain.[0m

[1m> Finished chain.[0m

[1m> Finished chain.[0m


In [26]:
len(''' GPT-3 and BERT are two of the most popular language models used in natural language processing (NLP). This article compares the two models, exploring their capabilities and discussing the tools that use them. GPT-3 is an autoregressive language model developed by OpenAI, trained on 45TB of text data, and capable of generating human-like text. BERT is a transformer-based model that can be used for tasks such as question answering, summarization, and translation. [from url1]',
 ' GPT-3 and BERT are two popular AI content writing tools. GPT-3 is an autoregressive model while BERT is bidirectional, taking into account both left and right context when making predictions, making it better suited for sentiment analysis or natural language understanding tasks. [from url1]',
 ' GPT-3 and BERT are two large language models that use the Transformer architecture to learn context from textual-based datasets using attention mechanisms. GPT-3 has 1.5 billion parameters and was trained on 45TB of data, while BERT has 340 million parameters and was trained on 3TB of data. GPT-3 is significantly larger than BERT due to its much more extensive training dataset size (470 times bigger than the one used to train BERT). [from url1]',
 ' GPT-3 and BERT are unsupervised learning models that can perform various NLP tasks such as question answering, summarization, or translation. GPT-3 tends to outperform BERT in tasks such as summarization or translation due to its larger training dataset size, while BERT tends to do better in tasks such as sentiment analysis or NLU due to its bidirectional nature. [from url1]',
 ' GPT-3 and BERT are both powerful tools for Natural Language Processing (NLP) tasks, but each model is better suited for certain tasks than others. GPT-3 is better for summarization and translation, while BERT is better for sentiment analysis and NLU. The choice between the two models depends on the specific needs of the task. [from url1]',
 ' Stack Exchange network consists of 181 Q&A communities, including Stack Overflow, the largest online community for developers. Data Science Stack Exchange is a question and answer site for Data science professionals, Machine Learning specialists, and those interested in learning more about the field. The BERT paper explains that BERT is an encoder-only model, while the GPT paper explains that GPT is a decoder-only model. The difference between encoder and decoder blocks is that GPT Decoder looks only at previously generated tokens and learns from them, while BERT Encoder gives attention to tokens on both sides. [from url2]',
 ' GPT2 and BERT have technical differences in hyper-parameters, but no other conceptual differences. GPT-2 is a decode-only model trained using the left-to-right language objective and operates autoregressively, while BERT is an encoder-only model trained with the masked language-modeling objective and operates non-autoregressively. [from url2]',
 ' BERT can be used for zero- or few-shot learning through PET (Pattern-Exploiting Training) which uses templates. GPT-2 is usually used by sampling from the model, which requires a prompt and can be done with tutorials from Huggingface. [from url2]',
 ' This text discusses two topics: a YA novel about a girl in a dystopian society and the prosecutability of lying in an application for a job at a private company. It also mentions DeSantis\' theory about the US Constitution\'s "leverage points" and the use of cookies on Stack Exchange. [from url2]']''')

3498

In [28]:
len('''This text compares two of the most popular language models in natural language processing (NLP): GPT-3 and BERT. GPT-3 is an autoregressive language model developed by OpenAI, trained on 45TB of text data from sources such as Wikipedia, books, and webpages. It is capable of generating human-like text when given a prompt and can be used for tasks such as question answering, summarization, translation, and more. BERT is also a powerful language model, used for tasks such as sentiment analysis and question answering. [from url1]
GPT-3 and BERT are two popular AI content writing tools. GPT-3 is an autoregressive model, while BERT is a bidirectional transformer model which considers both left and right context when making predictions. This makes BERT better suited for tasks such as sentiment analysis or natural language understanding (NLU). BERT serves as the base for a number of services. [from url1]
GPT-3 and BERT are two large language models that use the Transformer architecture to learn context from textual-based datasets. The main difference between them lies in their training datasets, with GPT-3 having access to 45TB of data and BERT having access to 3TB of data. GPT-3 is also significantly larger than BERT due to its much more extensive training dataset size. Despite their differences, they share some similarities such as using the Transformer architecture and attention mechanisms. [from url1]
GPT-3 and BERT are unsupervised learning models that can be used for various NLP tasks such as question answering, summarization, or translation. GPT-3 tends to outperform BERT in tasks that require more data, such as summarization or translation, while BERT is better suited for tasks such as sentiment analysis or NLU due to its bidirectional nature. [from url1]
GPT-3 and BERT are both powerful tools for Natural Language Processing (NLP) tasks. However, due to their different architectures and training dataset sizes, each model is better suited for different tasks. GPT-3 is better for summarization and translation, while BERT is better for sentiment analysis and NLU. Ultimately, the choice between the two models depends on the specific needs of the user and the task they are trying to accomplish. [from url1]
Stack Exchange network consists of 181 Q&A communities, including Stack Overflow, the largest online community for developers. Data Science Stack Exchange is a question and answer site for Data science professionals, Machine Learning specialists, and those interested in learning more about the field. In the BERT paper, it is learnt that BERT is an encoder-only model, involving only transformer encoder blocks. In the GPT paper, it is learnt that GPT is a decoder-only model, involving only transformer decoder blocks. The difference between encoder and decoder blocks is that GPT Decoder looks only at previously generated tokens and learns from them, while BERT Encoder gives attention to tokens on both sides. [from url2]
This text discusses the differences between GPT-2 and BERT, two models used for natural language processing. GPT-2 is a decoder-only model trained using the left-to-right language objective and operates autoregressively, while BERT is an encoder-only model trained with the masked language-modeling objective and operates non-autoregressively. Apart from technical differences in hyper-parameters, there are no other conceptual differences between the two models. [from url2]
BERT and other masked language models can be used for zero- or few-shot learning through a method called PET (Pattern-Exploiting Training). This uses templates to exploit the language modeling abilities of BERT, such as for sentiment analysis. Working with GPT-2 is not as straightforward as with BERT, as the forward method returns hidden states that can be used as contextual embeddings. GPT-2 is usually used by sampling from the model, which requires a prompt and can be done through tutorials such as a blog post by Huggingface. [from url2]''')

3989

In [23]:
summary

'\n[GPT-3 and BERT] GPT-3 and BERT are two popular language models in natural language processing (NLP). GPT-3 is an autoregressive model trained on 45TB of text data, while BERT is a bidirectional transformer model trained on 3TB of data. GPT-3 is better for tasks that require more data, such as summarization or translation, while BERT is better for tasks such as sentiment analysis or natural language understanding (NLU). [url1]\n\n[PET] BERT and other masked language models can be used for zero- or few-shot learning through a method called PET (Pattern-Exploiting Training). GPT-2 is usually used by sampling from the model, which requires a prompt and can be done through tutorials such as a blog post by Huggingface. [url2]'

### Result

\n
[GPT-3 and BERT] GPT-3 and BERT are two popular AI content writing tools. GPT-3 is an autoregressive model, while BERT is a bidirectional transformer model which considers both left and right context when making predictions. GPT-3 was trained on 45TB of data, while BERT was trained on 3TB of data, and GPT-3 has 1.5 billion parameters while BERT has 340 million parameters. GPT-3 is better for summarization and translation, while BERT is better for sentiment analysis and NLU. [url1]\n
\n
[PET] BERT and other masked language models can be used for zero- or few-shot learning through a method called PET (Pattern-Exploiting Training). Working with GPT-2 is not as straightforward as with BERT, as the forward method returns hidden states that can be used as contextual embeddings. GPT-2 is usually used by sampling from the model, which requires a prompt and can be done through tutorials such as a blog post by Huggingface. [url2]\n
\n
[Irrelevant Text] This text discusses three different topics: replacing weapon die rolls with a character's proficiency bonus, what happens when an insurance company does not mention a critical clause of a policy during a telephone purchase, and how far apart the sun has drifted from Alpha Centari due to the expansion of the universe since its formation. It also mentions the terms of service, privacy policy, and code of conduct for Stack Exchange, as well as the Cookie Policy. [url2]