In this code, We build a simple research database management system using Python and various libraries such as tkinter, pandas, and nltk. The program allows users to submit new research papers, search the database for papers based on keywords, and export search results to a CSV file. We also implement data cleaning techniques such as stopword removal using nltk. The code is designed to be user-friendly and easy to navigate, making it accessible to researchers with varying levels of programming experience.

## Approach

The approach for the research database system involves using Python and its libraries to create a command-line interface and a graphical user interface for users to interact with the system. The data structure used to store the research papers is a Pandas DataFrame, which allows for easy querying and manipulation of the data. The system also makes use of the Natural Language Toolkit (NLTK) library to preprocess text data, such as removing stop words and stemming, before performing search queries.

For the command-line interface, users are able to perform search queries by entering a keyword or a phrase, which the system will then use to search through the DataFrame to find relevant research papers. The search results are then displayed to the user, along with a summary of the paper's title, author, abstract, and field.

For the graphical user interface, users are presented with a form to fill out to submit their own research paper to the database. The form includes fields for the paper's title, author, abstract, and field. When the user submits the form, the data is added to the Pandas DataFrame, and a message is displayed to confirm the successful submission.

Overall, the approach aims to provide a simple and intuitive interface for users to interact with the research database, while also using modern data processing techniques to ensure that search queries are as accurate and relevant as possible.

## Libaries

In [1]:
#import libaries
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
import numpy as np
import pandas as pd
import tkinter as tk
from tkinter import messagebox
import csv



## Data Cleaning

In [2]:
# Load the research database
research_raw = pd.read_csv('research_database.csv')

In [3]:
research_raw.head()

Unnamed: 0,title,author,abstract,keywords1,key2,key3
0,Optimizing Neural Network Training with Stocha...,John Smith,This paper proposes a new method for optimizin...,neural network,optimization,stochastic gradient descent
1,A Survey of Machine Learning Techniques for Fr...,Jane Doe,This paper reviews the current state of the ar...,machine learning,fraud detection,
2,A Comparative Study of Hashing Algorithms for ...,David Johnson,This paper compares the performance and securi...,passwords,hashing,security
3,A Review of Natural Language Processing Techni...,Sara Lee,This paper surveys the current state of the ar...,natural language processing,sentiment analysis,
4,A Study of Performance Tuning Techniques for R...,Adam Chen,This paper examines various performance tuning...,performance tuning,relational databases,


In [4]:
research_raw.key3 = research_raw.key3.replace(np.nan, '')

In [5]:
research_raw['keywords'] = research_raw.keywords1 + research_raw.key2 + research_raw.key3

In [6]:
research_raw.head()

Unnamed: 0,title,author,abstract,keywords1,key2,key3,keywords
0,Optimizing Neural Network Training with Stocha...,John Smith,This paper proposes a new method for optimizin...,neural network,optimization,stochastic gradient descent,neural network optimization stochastic gradien...
1,A Survey of Machine Learning Techniques for Fr...,Jane Doe,This paper reviews the current state of the ar...,machine learning,fraud detection,,machine learning fraud detection
2,A Comparative Study of Hashing Algorithms for ...,David Johnson,This paper compares the performance and securi...,passwords,hashing,security,passwords hashing security
3,A Review of Natural Language Processing Techni...,Sara Lee,This paper surveys the current state of the ar...,natural language processing,sentiment analysis,,natural language processing sentiment analysis
4,A Study of Performance Tuning Techniques for R...,Adam Chen,This paper examines various performance tuning...,performance tuning,relational databases,,performance tuning relational databases


In [7]:
research_df = research_raw[['title', 'author', 'abstract', 'keywords']]

In [8]:
research_df

Unnamed: 0,title,author,abstract,keywords
0,Optimizing Neural Network Training with Stocha...,John Smith,This paper proposes a new method for optimizin...,neural network optimization stochastic gradien...
1,A Survey of Machine Learning Techniques for Fr...,Jane Doe,This paper reviews the current state of the ar...,machine learning fraud detection
2,A Comparative Study of Hashing Algorithms for ...,David Johnson,This paper compares the performance and securi...,passwords hashing security
3,A Review of Natural Language Processing Techni...,Sara Lee,This paper surveys the current state of the ar...,natural language processing sentiment analysis
4,A Study of Performance Tuning Techniques for R...,Adam Chen,This paper examines various performance tuning...,performance tuning relational databases
...,...,...,...,...
64,A Study of the Impact of Interest Rate Policy ...,Samuel Adejare,This paper examines the impact of interest rat...,interest rate policy consumer behavior
65,A Comparative Study of the Performance of Diff...,Bolu Adetunji,This paper compares the performance of differe...,investment vehicles performance
66,A Study of the Impact of Foreign Direct Invest...,Busayo Ogundele,This paper analyzes the impact of foreign dire...,foreign direct investment economic growth
67,An Analysis of the Efficiency of the Stock Mar...,Goodness Okonkwo,This paper examines the efficiency of the stoc...,stock market efficiency capital allocation


In [None]:
research_df.head(1)

In [9]:
research_df.to_csv('research_database_clean.csv', index=False)

In [10]:
research_df = pd.read_csv('research_database_clean.csv')

## Retrieval Function

In [19]:
# Download NLTK stopwords if not already downloaded
# nltk.download('stopwords')

def search_paper(topic):
    # Load the CSV file into a pandas DataFrame
    df = pd.read_csv('research_database_clean.csv')

    # Remove stopwords from the topic
    stop_words = set(stopwords.words('english'))
    topic_words = [word.lower() for word in topic.split() if word.lower() not in stop_words]
    topic = ' '.join(topic_words)

    # Filter the DataFrame to find matching papers
    found_papers = df[df.apply(lambda row: any(word in row['title'].lower() or word in row['abstract'].lower()
                                               or word in row['author'].lower()
                                               or word in row['keywords'].lower()
                                               for word in topic_words), axis=1)]
    
    # Convert the matching papers to a list of dictionaries
    found_papers = found_papers.to_dict('records')

    return found_papers


In [20]:
# search_paper('Information In Retrieval Fraud')

In [21]:
def submit_paper(title, author, abstract, keywords):
    # create a new row with the provided data
    new_row = pd.DataFrame([[title, author, abstract, keywords]],
                           columns=["title", "author", "abstract", "keywords"])
    
    # Load the CSV file into a pandas DataFrame
    df = pd.read_csv('research_database_clean.csv')    

    # add the new row to the dataframe
    df = pd.concat([df, new_row], ignore_index=True)

    # save the updated dataframe to the CSV file
    df.to_csv('research_database_clean.csv', index=False)

    # show success message
    messagebox.showinfo("Success", "Your paper has been submitted!")


In [22]:
# submit_paper(',',',',',',',')

## GUI

In [23]:
# GUI function to search for a paper
def search_gui():
    def search():
        topic = entry_topic.get()
        found_papers = search_paper(topic)
        if len(found_papers) == 0:
            messagebox.showinfo("Not Found", "No papers found with that topic.")
        else:
            # Create table headers
            headers = ["Title", "Author", "Abstract"]
            for i, header in enumerate(headers):
                header_label = tk.Label(result_frame, text=header, font=("Helvetica", 10, "bold"))
                header_label.grid(row=0, column=i, sticky="nsew", padx=5, pady=5)

            # Create table rows for each paper found
            for i, paper in enumerate(found_papers):
                title_label = tk.Label(result_frame, text=paper["title"], font=("Helvetica", 8))
                title_label.grid(row=i+1, column=0, sticky="nsew", padx=5, pady=5)

                author_label = tk.Label(result_frame, text=paper["author"], font=("Helvetica", 8))
                author_label.grid(row=i+1, column=1, sticky="nsew", padx=5, pady=5)

                year_label = tk.Label(result_frame, text=paper["abstract"], font=("Helvetica", 8))
                year_label.grid(row=i+1, column=2, sticky="nsew", padx=5, pady=5)

                

    # Create a new Toplevel window for the search interface
    window = tk.Toplevel()
    window.geometry("500x300")
    window.title("Search for a Paper")

    label_topic = tk.Label(window, text="Enter a Topic:", width=20, height=2)
    label_topic.pack()

    entry_topic = tk.Entry(window, width=50)
    entry_topic.pack()

    button_search = tk.Button(window, text="Search", width=10, height=2, command=search)
    button_search.pack()

    # Create a frame to display the search results
    result_frame = tk.Frame(window)
    result_frame.pack(pady=10)

    window.mainloop()


In [24]:
# GUI function to submit a paper
def submit_gui():
    def submit():
        title = entry_title.get()
        author = entry_author.get()
        abstract = entry_abstract.get()
        field = entry_field.get()
        submit_paper(title, author, abstract, field)
        messagebox.showinfo("Success", "Paper submitted successfully.")
        window.destroy()

    window = tk.Toplevel()
    window.title("Submit a Paper")

    label_title = tk.Label(window, text="Title:")
    label_title.pack()
    entry_title = tk.Entry(window, width=50)
    entry_title.pack()

    label_author = tk.Label(window, text="Author:")
    label_author.pack()
    entry_author = tk.Entry(window, width=50)
    entry_author.pack()

    label_abstract = tk.Label(window, text="Abstract:")
    label_abstract.pack()
    entry_abstract = tk.Entry(window, width=50)
    entry_abstract.pack()

    label_field = tk.Label(window, text="Field:")
    label_field.pack()
    entry_field = tk.Entry(window, width=50)
    entry_field.pack()

    button_submit = tk.Button(window, text="Submit", command=submit)
    button_submit.pack()

    window.mainloop()


In [None]:
# GUI main function
def main_gui():
    window = tk.Tk()
    window.title("Research Assistant")
     
    window.geometry("500x300")

    # Create a label to display welcome message
    welcome_label = tk.Label(window, text="Welcome To Group 3 Information Retrieval Project", font=("Helvetica", 16))
    welcome_label.pack(pady=20)

    # Create a frame to hold the buttons and center it
    button_frame = tk.Frame(window)
    button_frame.pack(expand=True)
    button_frame.place(relx=0.5, rely=0.5, anchor=tk.S)

    # Create buttons and add to button frame
    button_search = tk.Button(button_frame, text="Search for a Paper", command=search_gui)
    button_search.pack(pady=10)
    button_submit = tk.Button(button_frame, text="Submit a Paper", command=submit_gui)
    button_submit.pack(pady=10)

    window.mainloop()

# Call the main_gui() function to start the program
main_gui()


## Recommendation

Here are some areas of improvement that could be considered:

- Improved User Interface: While the current GUI works, it could be improved to provide a more user-friendly experience. This could include better organization of information, clearer labeling of fields, and more intuitive design.

- Better Search Functionality: The current search function is limited to searching for keywords in the title, abstract, and keywords fields. It could be improved by using natural language processing techniques to better match user queries with relevant papers.

- More Robust Error Handling: While the current code has some error handling, it could be more robust to handle a wider range of errors and provide more helpful error messages to users.

- Integration with a Database Management System: While the current code stores papers in a CSV file, it could be improved by integrating with a database management system like MySQL or PostgreSQL. This would allow for faster queries and more efficient storage of data.

- Better Documentation: While the code is relatively well-commented, more detailed documentation could be added to help future developers understand the code and its functionality. This could include things like code examples, detailed explanations of functions, and more.

Overall, these improvements could help make the research paper management system more user-friendly, efficient, and scalable.