#Introduction

This notebook has all the code you need to create your own chatbot with custom knowledge base using GPT-3. 

Follow the instructions for each steps and then run the code sample.


#Download the data for your custom knowledge base
For the demonstration purposes we are going to use NEOM wiki as our knowledge base. You can download them to your local folder from the github repository by running the code below.
Alternatively, you can put your own custom data into the local folder. 

# Install the dependicies
Run the code below to install the depencies we need for our functions

In [7]:
!pip install llama-index==0.5.6
!pip install langchain==0.0.148

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


# Define the functions
The following code defines the functions we need to construct the index and query it

In [8]:
from llama_index import SimpleDirectoryReader, GPTListIndex, readers, GPTSimpleVectorIndex, LLMPredictor, PromptHelper, ServiceContext
from langchain import OpenAI
import sys
import os
from IPython.display import Markdown, display

def construct_index(directory_path):
    # set maximum input size
    max_input_size = 4096
    # set number of output tokens
    num_outputs = 2000
    # set maximum chunk overlap
    max_chunk_overlap = 20
    # set chunk size limit
    chunk_size_limit = 600 

    # define prompt helper
    prompt_helper = PromptHelper(max_input_size, num_outputs, max_chunk_overlap, chunk_size_limit=chunk_size_limit)

    # define LLM
    llm_predictor = LLMPredictor(llm=OpenAI(temperature=0.5, model_name="text-davinci-003", max_tokens=num_outputs))
 
    documents = SimpleDirectoryReader(directory_path).load_data()
    
    service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper)
    index = GPTSimpleVectorIndex.from_documents(documents, service_context=service_context)

    index.save_to_disk('index.json')

    return index

def ask_ai():
    index = GPTSimpleVectorIndex.load_from_disk('index.json')
    while True: 
        query = input("What do you want to ask? ")
        response = index.query(query)
        display(Markdown(f"Response: <b>{response.response}</b>"))

# Set OpenAI API Key
You need an OPENAI API key to be able to run this code.

If you don't have one yet, get it by [signing up](https://platform.openai.com/overview). Then click your account icon on the top right of the screen and select "View API Keys". Create an API key.

Then run the code below and paste your API key into the text input.

In [11]:
os.environ["OPENAI_API_KEY"] = input("Paste your OpenAI key here and hit enter:")

Paste your OpenAI key here and hit enter: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


#Construct an index
Now we are ready to construct the index. This will take every file in the folder 'data', split it into chunks, and embed it with OpenAI's embeddings API.



In [12]:
construct_index("sample_data")

<llama_index.indices.vector_store.vector_indices.GPTSimpleVectorIndex at 0x7fa15704c130>

#Ask questions
It's time to test our AI and have fun. Run the function that queries GPT and type your question into the input. 

In [None]:
ask_ai()

What do you want to ask? what is neom ?


Response: <b>

Neom is a planned smart city in Tabuk Province in northwestern Saudi Arabia. It is a megaproject with an estimated cost of over $500 billion, and plans include multiple regions, including a floating industrial complex, global trade hub, tourist resorts, and a linear city—all powered exclusively by renewable energy sources. The name "Neom" is a portmanteau of the Ancient Greek prefix νέο Neo- meaning "new" and the first letter of Saudi Crown Prince Mohammed bin Salman's name and the first letter of the Arabic word for "future" (Arabic: مستقبل, romanized: Mustaqbal, Hejazi pronunciation: [mʊsˈtaɡbal]). It is intended to be a data-driven city, with data collected from the smartphones of the residents, their homes, facial recognition cameras and multiple other sensors, to help developers feed the collected information to the city for further predicting and customizing every user's needs. However, the project has been met with criticism due to Saudi Arabia's poor human rights record and use of espionage and surveillance technology for spying on its citizens, as well as a sponsorship with the League of Legends European Championship which gathered significant backlash</b>

What do you want to ask? what are the different projects in neom


Response: <b>

The different projects in Neom include The Line, Neom Bay, Neom Bay Airport, Oxagon, Trojena, Sindalah, Agriculture, Utilities, International Relations, water, healthcare, transport, security, data collection, and surveillance technology. Despite the potential for data collection and surveillance technology to invade the privacy and security of Neom's citizens, the Saudi Ministry of Communications and Information Technology has not responded to digital rights experts and researchers' requests for comments.</b>

What do you want to ask? by when neom is completing its project


Response: <b>

Neom plans for the majority of the city to be complete by 2039, although there have been some criticisms of the project due to its potential for surveillance and privacy invasion.</b>

What do you want to ask? who is the boss of neom


Response: <b>

The boss of Neom is Saudi Crown Prince Mohammed bin Salman, who has been criticized for his use of espionage and surveillance technology for spying on citizens and invading their privacy and security.</b>

What do you want to ask? tell some great things of neom


Response: <b>

Some great things about Neom include its ambitious plans to become a smart city powered exclusively by renewable energy sources, its commitment to modern manufacturing, industrial research, and development, its plans for a car-free city with all basic services within a 5-minute walking distance, its plans for a luxury resort complex off the city coast, its plans for 6,500 hectares (16,000 acres) of agricultural fields, its partnership with Mercedes-EQ Formula E Team, its plans to build the world's largest green hydrogen plant, its four-year global sponsorship agreement with the Asian Football Confederation, its plans to host Extreme E's 2022 Desert X-Prix and Island X-Prix, its plans to host the 2029 Asian Winter Games, its plans to use data as a currency to manage and provide facilities such as power, waste, water, healthcare, transport and security, and its plans to collect data from the smartphones of the residents, their homes, facial recognition cameras and multiple other sensors. However, the city has been criticized for its potential to invade the privacy and security of its citizens, as well as its sponsorship with the League of Legends European Championship.</b>