## 使用ChatGPT建置個人(IMDB電影)知識庫

運行流程:

1. 安裝使用的套件
2. 定義function，使用GPT 3.5 模型建立個人知識庫。
3. 定義function，使用GPT建立問答方式。
4. 提供使用API所需的密鑰與數據來源(個人資料庫)。
5. 展示功能使用方式

## Build a personal (IMDB movie) knowledge base using ChatGPT.
Process:

1. Install required packages.
2. Define functions to create a personal knowledge base using GPT 3.5 model.
3. Define functions to establish question-and-answer format using GPT.
4. Provide the API key and data source (personal database) required for use of API.
5. Demonstrate how to use the Customerized AI.

### 1. 安裝使用的套件(Install required packages.)

In [6]:
# !pip install llama-index
# !pip install langchain

### 2. 定義function，使用GPT模型建立個人知識庫。
### (Define functions to create a personal knowledge base using GPT 3.5 model.)

In [16]:
from llama_index import SimpleDirectoryReader, GPTListIndex, readers, GPTSimpleVectorIndex, LLMPredictor, PromptHelper, ServiceContext
from langchain import OpenAI
import sys
import os
from IPython.display import Markdown, display

def construct_index(directory_path):
    # set maximum input size
    max_input_size = 4096
    # set number of output tokens
    num_outputs = 2000
    # set maximum chunk overlap
    max_chunk_overlap = 20
    # set chunk size limit
    chunk_size_limit = 600 

    # define prompt helper
    prompt_helper = PromptHelper(max_input_size, num_outputs, max_chunk_overlap, chunk_size_limit=chunk_size_limit)

    # define LLM
    # llm_predictor = LLMPredictor(llm=OpenAI(temperature=0.5, model_name="text-davinci-003", max_tokens=num_outputs))
    llm_predictor = LLMPredictor(llm=OpenAI(temperature=0.5, model_name="gpt-3.5-turbo", max_tokens=num_outputs))
     
    documents = SimpleDirectoryReader(directory_path).load_data()
    
    service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper)
    index = GPTSimpleVectorIndex.from_documents(documents, service_context=service_context)

    index.save_to_disk('index.json')

    return index

### 3. 定義function，使用GPT建立問答方式。
### (Define functions to establish question-and-answer format using GPT.)

In [21]:
def ask_ai():
    index = GPTSimpleVectorIndex.load_from_disk('index.json')
    Status = True
    
    while Status: 
        query = input("What do you want to ask? >> ")
        
        # 當user輸入"stop"，停止ChatGPI
        if query == 'stop':   
            Status = False 
            print(f'Goodbye!')

        else:
            response = index.query(query)
            display(Markdown(f"Response: <b>{response.response}</b>"))
            print('='*20)

### 4. 提供使用API所需的密鑰與數據來源(個人資料庫)。 
### (Provide the API key and data source (personal database) required for use of API.)

In [18]:
from api_secrets import *

os.environ["OPENAI_API_KEY"] = chatgpi_api_key

In [19]:
construct_index("data")

INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total embedding token usage: 52583 tokens


<llama_index.indices.vector_store.vector_indices.GPTSimpleVectorIndex at 0x1a24a8a2c70>

### 5.展示功能使用方式
### (Demonstrate how to use the Customerized AI.)

Here shows **Three** customized questions of IMDB movie to test the AI:

>1. According to user data, what is the top 1 to top 3 movie in IMDB ranking? (from webscraping data at 2023/04/18)
>2. Which movie get the highest likes numbers of all movie?  (Likes data is a fake data given by user.)
>3. Who is the director of the movie "亂世兒女"?

In [25]:
ask_ai()

What do you want to ask? >> According to user data, what is the top 1 to top 3 movie in IMDB ranking?


INFO:llama_index.token_counter.token_counter:> [query] Total LLM token usage: 4412 tokens
INFO:llama_index.token_counter.token_counter:> [query] Total embedding token usage: 19 tokens


Response: <b>

The top 1 to top 3 movies in IMDB ranking according to user data are: 
1. 刺激1995 (9.2 rating, 333 likes)
2. 教父 (9.2 rating, 616 likes)
3. 黑暗騎士 (9.0 rating, 546 likes)
4. 火線追緝令 (8.6 rating, 103 likes)</b>

What do you want to ask? >> Which movie get the highest likes numbers of all movie?


INFO:llama_index.token_counter.token_counter:> [query] Total LLM token usage: 4273 tokens
INFO:llama_index.token_counter.token_counter:> [query] Total embedding token usage: 11 tokens


Response: <b>

The movie with the highest number of likes of all movies is 神鬼戰士, with a total of 596 likes. It is the Top 37 movie in the IMDB top 250 ranking, with a rating of 8.5. It is directed by Ridley Scott and stars Russell Crowe and Joaquin Phoenix. The information web page of this movie in IMDB website is https://www.imdb.com/title/tt0110357/.</b>

What do you want to ask? >> Who is the director of the movie "亂世兒女"?


INFO:llama_index.token_counter.token_counter:> [query] Total LLM token usage: 4205 tokens
INFO:llama_index.token_counter.token_counter:> [query] Total embedding token usage: 16 tokens


Response: <b>

The director of the movie "亂世兒女" is Michel Gondry.</b>

What do you want to ask? >> stop
Goodbye!
