<a href="https://colab.research.google.com/github/verneh/transformers/blob/main/custom_knowledge_test.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Introduction

This notebook has all the code you need to create your own chatbot with custom knowledge base using GPT-3.

Follow the instructions for each steps and then run the code sample. In order to run the code, you need to press "play" button near each code sample.

#Download the data for your custom knowledge base
For the demonstration purposes we are going to use ----- as our knowledge base. You can download them to your local folder from the github repository by running the code below.
Alternatively, you can put your own custom data into the local folder.

In [None]:
! git clone https://github.com/irina1nik/context_data.git

Cloning into 'context_data'...
remote: Enumerating objects: 30, done.[K
remote: Counting objects: 100% (2/2), done.[K
remote: Total 30 (delta 1), reused 1 (delta 1), pack-reused 28[K
Unpacking objects: 100% (30/30), 12.56 KiB | 584.00 KiB/s, done.


# Install the dependicies
Run the code below to install the depencies we need for our functions

In [None]:
!pip install llama-index==0.5.6
!pip install langchain==0.0.148

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting llama-index==0.5.6
  Downloading llama_index-0.5.6.tar.gz (165 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m165.0/165.0 kB[0m [31m4.9 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting dataclasses_json (from llama-index==0.5.6)
  Downloading dataclasses_json-0.5.8-py3-none-any.whl (26 kB)
Collecting langchain (from llama-index==0.5.6)
  Downloading langchain-0.0.215-py3-none-any.whl (1.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.1/1.1 MB[0m [31m29.4 MB/s[0m eta [36m0:00:00[0m
Collecting openai>=0.26.4 (from llama-index==0.5.6)
  Downloading openai-0.27.8-py3-none-any.whl (73 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m73.6/73.6 kB[0m [31m11.1 MB/s[0m eta [36m0:00:00[0m
Collecting tiktoken (from llama-index==0.5.6)
  Downloading tiktoken-0.4.0-cp

# Define the functions
The following code defines the functions we need to construct the index and query it

In [14]:
from llama_index import SimpleDirectoryReader, GPTListIndex, readers, GPTSimpleVectorIndex, LLMPredictor, PromptHelper, ServiceContext
from langchain import OpenAI
import sys
import os
from IPython.display import Markdown, display

def construct_index(directory_path):
    # set maximum input size
    max_input_size = 4096
    # set number of output tokens
    num_outputs = 3000
    # set maximum chunk overlap
    max_chunk_overlap = 20
    # set chunk size limit
    chunk_size_limit = 600

    # define prompt helper
    prompt_helper = PromptHelper(max_input_size, num_outputs, max_chunk_overlap, chunk_size_limit=chunk_size_limit)

    # define LLM
    llm_predictor = LLMPredictor(llm=OpenAI(temperature=0.5, model_name="text-davinci-003", max_tokens=num_outputs))

    documents = SimpleDirectoryReader(directory_path).load_data()

    service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper)
    index = GPTSimpleVectorIndex.from_documents(documents, service_context=service_context)

    index.save_to_disk('index.json')

    return index

def ask_ai():
    index = GPTSimpleVectorIndex.load_from_disk('index.json')
    while True:
        query = input("What do you want to ask? ")
        response = index.query(query)
        display(Markdown(f"Response: <b>{response.response}</b>"))

# Set OpenAI API Key
You need an OPENAI API key to be able to run this code.

If you don't have one yet, get it by [signing up](https://platform.openai.com/overview). Then click your account icon on the top right of the screen and select "View API Keys". Create an API key.

Then run the code below and paste your API key into the text input.

In [9]:
os.environ["OPENAI_API_KEY"] = input("Paste your OpenAI key here and hit enter:")

Paste your OpenAI key here and hit enter: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


#Construct an index
Now we are ready to construct the index. This will take every file in the folder 'data', split it into chunks, and embed it with OpenAI's embeddings API.

**Notice:** running this code will cost you credits on your OpenAPI account ($0.02 for every 1,000 tokens). If you've just set up your account, the free credits that you have should be more than enough for this experiment.

In [10]:
construct_index("data/")

<llama_index.indices.vector_store.vector_indices.GPTSimpleVectorIndex at 0x7f614cecafb0>

#Ask questions
It's time to have fun and test our AI. Run the function that queries GPT and type your question into the input.

If you've used the provided example data for your custom knowledge base, here are a few questions that you can ask:
1. Why people cook at home? Make classification
2. Make classification about what frustrates people about cooking?
3. Brainstorm marketing campaign ideas for an air fryer that would appeal people that cook at home
4. Which kitchen appliences people use most often?
5. What people like about cooking at home?

In [None]:
ask_ai()

What do you want to ask? What is the SLC and what is the role of Correlaid in this project?


Response: <b>
The SLC is the Suriname Living Conditions Survey, a survey conducted by the government of Suriname to measure the living conditions of its citizens. Correlaid is a non-profit organization that is partnering with the SLC to provide data analysis and visualisation support for the survey. Correlaid is looking for two data analysts to help with the project, as well as a trainee to assist with the data collection and analysis.</b>

What do you want to ask? How did Correlaid help in the survey?


Response: <b>
Correlaid provided the platform for the survey to be conducted. They provided the resources and support necessary to ensure that the survey was conducted in a professional and efficient manner. They also provided the data analysis and visualisation tools needed to analyse the survey results and present them in a meaningful way.</b>

What do you want to ask? Who were the authors from Correlaid Nederlands that helped in the Hunger Risk Mapping Suriname report and what did they do?


Response: <b>

The authors from CorrelAidxNL that helped in the Hunger Risk Mapping Suriname report were:

- Jeroen van der Velden: Lead researcher and data analyst, responsible for calculating food expenditure deficits according to energy sufficient, nutrient adequate, and healthy diets, and converting all results into 2020 US dollars.
- Joram van der Heide: Lead data analyst, responsible for analyzing household consumption data and estimating necessary non-food expenses based on the 2016 Survey of Living Conditions.
- Jelmer van der Heide: Lead data analyst, responsible for scaling population numbers according to 2020 total population number for Suriname and capping deficits by food cost.
- Jelle van der Heide: Lead data analyst, responsible for providing raw data and plots of food expenditure deficits.</b>

What do you want to ask? ho were the authors from Correlaid Nederlands for the Hunger Risk mapping Suriname Report?


Response: <b>

The authors from CorrelAid Nederlands for the Hunger Risk mapping Suriname Report were a team of data scientists, researchers, and analysts. They used a combination of quantitative and qualitative data sources to analyze the food security situation in Suriname. They used open data sources such as the FAO, USDA, Comtrade, ILO, Inter-American Development Bank, Algemeen Bureau Statistiek Suriname, Centrale Bank van Suriname, CARICOM, International Red Cross, and UN, UNDP. They used this data to analyze food availability, food prices, household incomes, affordability of nutritious diet, floods, and local products and existing food aid. They then estimated food expenditure deficits for an energy sufficient diet, a nutrient adequate diet, and a healthy diet, based on household consumption data and estimates of necessary non-food expenses. All results were converted into 2020 US dollars and presented in a report on 22 December 2021.</b>

What do you want to ask? Who exactly were the authors from Correlaid Nederlands for the Hunger Risk mapping Suriname Report?


Response: <b>

The authors of the Hunger Risk Mapping Suriname Report from CorrelAidxNL were a team of data scientists, researchers, and analysts. The team included: 

- Dr. Jeroen van der Heijden, Lead Data Scientist
- Dr. Marije van der Heijden, Lead Researcher
- Dr. Jeroen van der Heijden, Lead Analyst
- Dr. Marije van der Heijden, Lead Writer
- Dr. Jeroen van der Heijden, Lead Editor
- Dr. Marije van der Heijden, Lead Designer
- Dr. Jeroen van der Heijden, Lead Programmer
- Dr. Marije van der Heijden, Lead Data Visualizer

The team used data from the 2016 Survey of Living Conditions to calculate food expenditure deficits according to three diets (energy sufficient, nutrient adequate, and healthy) and converted the results into 2020 US dollars. The raw data and plots of the results can be found at the following links: 

Energy Sufficient Diet
Raw data: https://github.com/CorrelAidxNL/slc-hunger-risk/blob/main/</b>

What do you want to ask? Can you summarize the SLC Presentation and mention who its authors are?


Response: <b>
The SLC Presentation is a project by CorrelAid, a non-profit organization that works to improve the lives of people in need through data science. The project aims to provide evidence-based data to the Suriname government to help them better understand and address the issue of food insecurity in the country. The authors of the presentation are Valerie Habbel, Patryk Kubiczek, Sandra Morgenstern, David Jankoski, Maria Bader, and Liza Olenderek. All of them have experience in data analysis, software development, and research, and are motivated to use their skills to contribute to the project.</b>

What do you want to ask? Can you summarize the project goals for SLC?


Response: <b>
The project goals for SLC are to use data analysis and visualization to support their application for emergency food aid in Suriname. They will be conducting interviews with key organizations in Suriname and researching open data sources to test the claim that there is a risk for food shortage in Suriname, and if so, which locations/regions will be affected the most. They will also be looking for data on food insecurity, such as food production, imports and exports, poverty, smallholder/subsistence farming, access to food (roads/transport), and other factors that may contribute to food insecurity.</b>

What do you want to ask? The major issue for food insecurity is reduced household incomes combined with increasing food prices. Surinamese have less income to spend on food that is increasingly expensive.   Smart Living Connection is a foundation which aims to help the socioeconomic and cultural development of Curaçao, Suriname, Aruba and Sint-Maarten. It has been involved in the design, securement of financing, and implementation of projects to stimulate economic growth, combat poverty, and safeguard cultural heritage.  Last year a report was produced pro bono publico by CorrelAidxNL volunteers in collaboration with Smart Lifestyle Connection.  They were involved in Quick Analyzed Data project on Food Security in the Caribbean Part of the Kingdom. The corona pandemic and lockdowns paralyzed the tourism industry on the islands, causing huge job losses. The result was a lack of income and food. With this project, and the resulting data analysis document, SLC was successful in securing e

Response: <b>
Smart Living Connection is looking for two data scientists/analysts to join their team in a project to help combat food insecurity in Suriname. The project will involve researching open data sources to analyze and visualize data on food security in Suriname to support SLC’s application for help. The data will be used to back up SLC’s story and convince emergency aid institutions in the Netherlands and EU of the severity and urgency of the issue. The project will start on 11th October 2021 and end on 22nd December 2021, with an expected time effort of 4 hours a week. If you have experience in Python or R for analysis and a passion for humanitarian goals, apply now before the 8th October 2021 deadline.</b>

What do you want to ask? The major issue for food insecurity is reduced household incomes combined with increasing food prices. Surinamese have less income to spend on food that is increasingly expensive.  Currently, another large food shortage is expected, this time in Suriname, mainland of South America. Help has been requested, including from the IMF, but it is expected to only come through in six months or more. Already political, economic, and financial reforms by a new government are on their way, but it is important to convince emergency aid institutions in the Netherlands and EU of the severity and urgency of the issue.  Smart Lifestyle Connection is a foundation which aims to help the socioeconomic and cultural development of Curaçao, Suriname, Aruba and Sint-Maarten. It has been involved in the design, securement of financing, and implementation of projects to stimulate economic growth, combat poverty, and safeguard cultural heritage. Last year, they were involved in the Quic

Response: <b>
The Caribbean islands of Curaçao, Suriname, Aruba, and Sint-Maarten have been hit hard by the coronavirus pandemic. With the tourism industry paralyzed, many people have lost their jobs and are struggling to make ends meet. Smart Lifestyle Connection (SLC), a foundation that works to promote socioeconomic and cultural development in the region, has been working to secure emergency food aid for the islands. 

Recently, SLC has noted signs of hunger risk in Suriname and requested a quickscan on hunger risk with available open data. In response, CorrelAidxNL volunteers produced a report on Hunger Risk Mapping Suriname pro bono publico. The report used different open data sources to search for data on food insecurity, such as food production, imports and exports, poverty, smallholder/subsistence farming, access to food (roads/transport), and other factors that may contribute to food insecurity. The report found that the major issue for food insecurity is reduced household incomes combined with increasing food prices. Surinamese have less income to spend on food that is increasingly expensive. In addition, the floods in 2021 have resulted in loss of household assets and flooding of agricultural fields, and</b>