Automated-Data-Query-and-Retrieval-System-Using-Offline-free-open-source-

Step 1: Setup Environment Install the required Python libraries:

bash Copy pip install pandas pymongo langchain llama-index transformers Ensure MongoDB is installed and running locally or remotely. Update the connection string in the script accordingly.

Step 2: Python Script Below is the Python script that covers all the requirements:

----/python import pandas as pd from pymongo import MongoClient from langchain.chains import LLMChain from langchain.prompts import PromptTemplate from langchain.llms import HuggingFacePipeline from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline import warnings

Suppress warnings

warnings.filterwarnings("ignore")

Step 1: Load CSV data into MongoDB

def load_csv_to_mongodb(csv_file, db_name, collection_name, mongo_uri="mongodb://localhost:27017/"): """ Load CSV data into MongoDB. """ # Read CSV file df = pd.read_csv(csv_file)

# Connect to MongoDB
client = MongoClient(mongo_uri)
db = client[db_name]
collection = db[collection_name]

# Insert data into MongoDB
records = df.to_dict("records")
collection.insert_many(records)
print(f"Data loaded into MongoDB collection '{collection_name}'.")

Step 2: Generate MongoDB query using LLM

def generate_mongodb_query(user_input, column_name): """ Generate a MongoDB query using an open-source LLM. """ # Load an open-source LLM (e.g., GPT-Neo) model_name = "EleutherAI/gpt-neo-1.3B" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name) pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, max_length=100) llm = HuggingFacePipeline(pipeline=pipe)

# Define prompt template
prompt_template = PromptTemplate(
    input_variables=["user_input", "column_name"],
    template="Generate a MongoDB query to {user_input} for the column '{column_name}'."
)

# Create LLM chain
chain = LLMChain(llm=llm, prompt=prompt_template)
query = chain.run(user_input=user_input, column_name=column_name)
return query

Step 3: Execute MongoDB query and retrieve data

def execute_mongodb_query(query, db_name, collection_name, mongo_uri="mongodb://localhost:27017/"): """ Execute MongoDB query and retrieve data. """ # Connect to MongoDB client = MongoClient(mongo_uri) db = client[db_name] collection = db[collection_name]

# Execute query
try:
    result = list(eval(query))  # Convert string query to executable code
    return result
except Exception as e:
    print(f"Error executing query: {e}")
    return None

Step 4: Display or save retrieved data

def display_or_save_data(data, save_file=None): """ Display or save retrieved data. """ if data: df = pd.DataFrame(data) if save_file: df.to_csv(save_file, index=False) print(f"Data saved to {save_file}.") else: print(df) else: print("No data retrieved.")

Main function

def main(): # Step 1: Load CSV data into MongoDB csv_file = "sample_data.csv" # Replace with your CSV file db_name = "product_db" collection_name = "products" load_csv_to_mongodb(csv_file, db_name, collection_name)

# Step 2: User interaction
user_input = input("Enter your query (e.g., 'find products with price greater than 50'): ")
column_name = input("Enter the column name (e.g., 'Price'): ")

# Generate MongoDB query
query = generate_mongodb_query(user_input, column_name)
print(f"Generated Query: {query}")

# Step 3: Execute query and retrieve data
data = execute_mongodb_query(query, db_name, collection_name)

# Step 4: Display or save data
save_option = input("Do you want to save the data? (yes/no): ").lower()
if save_option == "yes":
    save_file = input("Enter the file name to save (e.g., test_case1.csv): ")
    display_or_save_data(data, save_file)
else:
    display_or_save_data(data)

if name == "main": main() Step 3: Documentation README.md markdown Copy

Automated Data Query and Retrieval System

This system automates data query and retrieval using offline, open-source Large Language Models (LLMs) with CSV, MongoDB, LlamaIndex, and LangChain.

Setup

Install the required Python libraries:

pip install pandas pymongo langchain llama-index transformers

Ensure MongoDB is installed and running.

Usage Place your CSV file in the same directory as the script and update the csv_file variable in the script.

Run the script:

bash Copy python main.py Follow the prompts to input your query and column name.

Choose to display or save the retrieved data.

Test Cases Find all products with a rating below 4.5 that have more than 200 reviews and are offered by the brand 'Nike' or 'Sony'.

Which products in the Electronics category have a rating of 4.5 or higher and are in stock?

List products launched after January 1, 2022, in the Home & Kitchen or Sports category.

Output Query results are displayed in the console or saved as CSV files.

Generated queries are logged in Queries_generated.txt.

Step 4: Test Cases

Test Case 1

Input: "Find all products with a rating below 4.5 that have more than 200 reviews and are offered by the brand 'Nike' or 'Sony'."
Generated Query: db.products.find({ "Rating": { "$lt": 4.5 }, "Reviews": { "$gt": 200 }, "Brand": { "$in": ["Nike", "Sony"] } })
Output: Saved as test_case1.csv.

Test Case 2

Input: "Which products in the Electronics category have a rating of 4.5 or higher and are in stock?"
Generated Query: db.products.find({ "Category": "Electronics", "Rating": { "$gte": 4.5 }, "InStock": True })
Output: Saved as test_case2.csv.

Test Case 3

Input: "List products launched after January 1, 2022, in the Home & Kitchen or Sports category."
Generated Query: db.products.find({ "LaunchDate": { "$gt": "2022-01-01" }, "Category": { "$in": ["Home & Kitchen", "Sports"] } })
Output: Saved as test_case3.csv.

Step 5: Deliverables

Python script (main.py).
Documentation (README.md).
Sample CSV file (sample_data.csv).
Generated queries (Queries_generated.txt).
Output CSV files for test cases (test_case1.csv, test_case2.csv, test_case3.csv).

This solution provides a robust, scalable, and efficient system for automated data query and retrieval using offline LLMs.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
app.py		app.py
load_csv_to_mongodb.py		load_csv_to_mongodb.py
sample_data.csv		sample_data.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Automated-Data-Query-and-Retrieval-System-Using-Offline-free-open-source-

Suppress warnings

Step 1: Load CSV data into MongoDB

Step 2: Generate MongoDB query using LLM

Step 3: Execute MongoDB query and retrieve data

Step 4: Display or save retrieved data

Main function

Automated Data Query and Retrieval System

Setup

Step 4: Test Cases

Test Case 1

Test Case 2

Test Case 3

Step 5: Deliverables

About

Uh oh!

Releases

Packages

Languages

vinaykumarreddyr/Automated-Data-Query-and-Retrieval-System-Using-Offline-free-open-source-

Folders and files

Latest commit

History

Repository files navigation

Automated-Data-Query-and-Retrieval-System-Using-Offline-free-open-source-

Suppress warnings

Step 1: Load CSV data into MongoDB

Step 2: Generate MongoDB query using LLM

Step 3: Execute MongoDB query and retrieve data

Step 4: Display or save retrieved data

Main function

Automated Data Query and Retrieval System

Setup

Step 4: Test Cases

Test Case 1

Test Case 2

Test Case 3

Step 5: Deliverables

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages