## Database Creation

In [7]:
import sqlite3

def initialize_db(db_name="context_data.db"):
    conn = sqlite3.connect(db_name)
    cursor = conn.cursor()
    
    # Create a table to store contexts
    cursor.execute('''CREATE TABLE IF NOT EXISTS contexts
                     (id INTEGER PRIMARY KEY, context TEXT)''')
    conn.commit()
    return conn, cursor

def insert_context(conn, cursor, context):
    cursor.execute("INSERT INTO contexts (context) VALUES (?)", (context,))
    conn.commit()

conn, cursor = initialize_db()

# Insert a sample context
# insert_context(conn, cursor, "The sun is a star located at the center of our Solar System.")


## Model Generation

In [2]:
from transformers import GPT2Tokenizer, GPT2LMHeadModel

tokenizer = GPT2Tokenizer.from_pretrained('gpt2-medium')
model = GPT2LMHeadModel.from_pretrained('gpt2-medium')

  from .autonotebook import tqdm as notebook_tqdm
Downloading model.safetensors: 100%|██████████| 1.52G/1.52G [03:40<00:00, 6.88MB/s]
Downloading (…)neration_config.json: 100%|██████████| 124/124 [00:00<00:00, 42.8kB/s]


## Save Model Locally

In [3]:
# Define your saving path
save_directory = "./local_gpt2_model"

# Save the model
model.save_pretrained(save_directory)

# Save the tokenizer
tokenizer.save_pretrained(save_directory)


('./local_gpt2_model/tokenizer_config.json',
 './local_gpt2_model/special_tokens_map.json',
 './local_gpt2_model/vocab.json',
 './local_gpt2_model/merges.txt',
 './local_gpt2_model/added_tokens.json')

## Access Local Model

In [1]:
# Recall your saving path
save_directory = "./local_gpt2_model"

In [2]:
from transformers import GPT2Tokenizer, GPT2LMHeadModel

# Load the model
model = GPT2LMHeadModel.from_pretrained(save_directory)

# Load the tokenizer
tokenizer = GPT2Tokenizer.from_pretrained(save_directory)

  from .autonotebook import tqdm as notebook_tqdm


## Helper Fucntions

In [3]:
def generate_answer_gpt2(question, context, max_length=500):
    input_text = f"{context}. {question}"
    input_ids = tokenizer.encode(input_text, return_tensors='pt')
    
    # Generate a response from the model
    outputs = model.generate(input_ids, max_length=max_length, num_return_sequences=1, pad_token_id=tokenizer.eos_token_id, no_repeat_ngram_size=2, top_k=50, top_p=0.95)
    
    # Decode the output
    generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
    
    # The answer is whatever was added by the model, after the input text
    answer = generated_text[len(input_text):].strip()
    
    return answer

In [11]:
def retrieve_all_contexts(cursor):
    cursor.execute("SELECT context FROM contexts")
    return [row[0] for row in cursor.fetchall()]

In [4]:
def retrieve_relevant_contexts(question, cursor, limit=5):
    # Extract keywords from the question (this is a simple approach, can be improved)
    keywords = question.split()
    
    # Query the database to find contexts that contain the keywords
    contexts = []
    for keyword in keywords:
        cursor.execute("SELECT context FROM contexts WHERE context LIKE ?", ('%' + keyword + '%',))
        contexts.extend([row[0] for row in cursor.fetchall()])
        
    # Return unique contexts and limit the number of results
    return list(set(contexts))[:limit]

In [13]:
# Integrate with the retriever
def answer_question_gpt2(question, cursor):
    contexts = retrieve_all_contexts(cursor)
    for context in contexts:
        answer = generate_answer_gpt2(question, context)
        if answer:
            return answer
    return "I don't have enough information to answer that question."

In [5]:
def answer_question_gpt2_v2(question, cursor):
    # Fetch the most relevant contexts for the question
    contexts = retrieve_relevant_contexts(question, cursor)
    
    # If no context found, provide a default answer
    if not contexts:
        return "I don't have enough information to answer that question."
    
    # Loop through each context and generate an answer using GPT-2
    for context in contexts:
        answer = generate_answer_gpt2(question, context)
        if answer:
            return answer
    return "I couldn't generate a satisfactory answer based on the provided contexts."


## Testing

In [8]:
# Sample question
print(answer_question_gpt2_v2("What is the sun?", cursor))

What are the stars? How do you know the weather? Where are you? Who are your neighbors? Do you have a car? Are you a student? A teacher? An employee? Is your car insured? If you are a resident of the City of Chicago, you may submit a bid for a vehicle. If your bid is accepted, the vehicle will be placed on the auction block. The bidding will begin at the beginning of bidding and will continue until all bids have been received. When the bidding is complete, a winner will receive a receipt for the winning bid. A winner may not be required to pay any additional fees or taxes.

Bidding on a Vehicle: When a bidder places a winning bidder, he or she will have the right to bid on behalf of any other bidder. Bidding will not take place until the closing of bids. In the event that a bidding war occurs, each bidder will bid in the order in which they placed their bids, and the winner of that bidding conflict will win the bid that was placed first. No bid will exceed the highest bid received by t

## Feeding DPM

In [None]:
def read_and_split_data(file_name):
    with open(file_name, 'r', encoding='utf-8') as file:
        # Read all lines from the file
        lines = file.readlines()
        
        # Join lines and split by '.'
        full_text = ''.join(lines)
        contexts = full_text.split('.')
        
        # Strip whitespaces from each context
        return [context.strip() for context in contexts if context.strip()]
    
# Step 1: Initialize the database
conn, cursor = initialize_db()

# Step 2: Read and split the data from the file
contexts = read_and_split_data('content_Defence_Audit_Manual_Vol_A_Office_Mannual_20210121123223.txt')

# Step 3: Insert each context into the database
for context in contexts:
    insert_context(conn, cursor, context)

In [None]:
# Sample question on DPM
print(answer_question_gpt2_v2("What are the top 3 general duties of command officers and senior audit officers", cursor))

?
1. To ensure that the financial management of a department is in accordance with the law and the regulations.
2. In the event of any irregularities, to ensure the proper management and control of funds. 3. If the audit is not carried out in a timely manner, or if the report is incomplete, the
departmental head should be informed of such irregulars and should take appropriate measures to rectify them. The DGAD has
also recommended that a report on the performance of audit functions should also be prepared by
the DGAS.


In [None]:
# Sample question on DPM
print(answer_question_gpt2_v2("What is Vsauce?", cursor))

vauche is the Latin word for "to make a bargain" and is used to describe a legal agreement. It is also used in the English language to mean a written contract or a document that is signed by two or more parties.
,
.

,

 ,  . 
  
 vauch is not a word that means "a contract" or "an agreement". It means a set or series of agreements, or the agreement of two parties to a common object. The word vaul is derived from the French word "vauler" which means to make an offer or contract.
Vaucher is one of the most common terms used for legal agreements. A vauncher agreement is usually a formal agreement between two people who have agreed to the terms of a particular contract and to pay a certain amount of money. In the United States, vouchers are usually written in a form that makes it easy for the parties involved to understand and agree to it. Voucher agreements are often used as a way to settle disputes between parties, to resolve disputes over the amount or terms that should be paid, and as

In [None]:
print(answer_question_gpt2_v2("What is Science?", cursor))

Science is the study of the natural world, the understanding of its laws, and the application of those laws to human affairs. Science is also the art of understanding the laws of nature, of applying those principles to the human condition.
,
...


 
The first part of this book is devoted to a discussion of what is science, what it is not, how it differs from other sciences, its relation to religion, philosophy, politics, economics, law, art, literature, etc. The second part is concerned with the nature of science and its relationship to other branches of knowledge. It is then followed by a brief discussion on the relationship between science as a science of human knowledge and science in general.

,

  .  
A. A. Smith, The Science of Human Knowledge, (New York: Harper & Brothers, 1894), p. 5. B. J. F. Huxley, Science and Society, p, 5-6. C. W. M. G. S. Lewis, "The Nature of Science," in The Cambridge History of Philosophy, ed. R. L. Macpherson (Cambridge: Cambridge University Press, 19