-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Task 1: RAG and Text-to-SQL #95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughThe pull request introduces a Text-to-SQL Query Interface project. A new README provides an overview of integrating Retrieval-Augmented Generation (RAG) with SQL query generation, while detailed installation and usage instructions are included. The new Changes
Sequence Diagram(s)sequenceDiagram
participant U as User
participant UI as Streamlit UI
participant CA as CodeAgent
participant SE as sql_engine
participant DB as SQLite DB
U->>UI: Selects predefined or enters a custom query
UI->>CA: Initializes CodeAgent with SQL engine
CA->>SE: Sends query for execution
SE->>DB: Executes query on city_stats table
DB-->>SE: Returns query results
SE-->>CA: Forwards results
CA-->>UI: Delivers query response
UI-->>U: Displays results
Poem
Tip ⚡🧪 Multi-step agentic review comment chat (experimental)
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Nitpick comments (10)
text-to-sql-Shuchi/requirements.txt (1)
3-3
: Consider pinning SQLAlchemy minor version.SQLAlchemy follows semantic versioning, and the current version 2.0.25 might be outdated. Consider specifying only the major and minor versions for better security updates while maintaining compatibility.
-sqlalchemy==2.0.25 +sqlalchemy~=2.0.25text-to-sql-Shuchi/README.md (4)
24-26
: Fix Markdown formatting.The code span formatting has extra spaces and the code block is missing a language specifier.
-Create a -```.streamlit/secrets.toml ``` file and add: +Create a ```.streamlit/secrets.toml``` file and add: -``` +```toml🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
25-25: Spaces inside code span elements
null(MD038, no-space-in-code)
38-41
: Fix heading format and content.This section has formatting inconsistencies and should provide more details about compatibility requirements.
- - Install Dependencies - --- - Ensure you have Python 3.11 or later installed. Must check package versions have no any conflicting dependencies.Then, install the required dependencies: +### 2. Install Dependencies + +Ensure you have Python 3.11 or later installed. Check that package versions have no conflicting dependencies. Then, install the required dependencies:🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
38-38: Heading style
Expected: atx; Actual: setext(MD003, heading-style)
38-38: Headings must start at the beginning of the line
null(MD023, heading-start-left)
42-44
: Add language specifier to code block.The code block is missing a language specifier.
-``` +```bash pip install -r requirements.txt<details> <summary>🧰 Tools</summary> <details> <summary>🪛 markdownlint-cli2 (0.17.2)</summary> 42-42: Fenced code blocks should have a language specified null (MD040, fenced-code-language) </details> </details> --- `47-52`: **Fix section formatting and add language specifier.** The "Run the App" section is missing proper heading formatting and the code block needs a language specifier. ```diff - Run the App -Run the Streamlit app using the following command: +### 3. Run the App + +Run the Streamlit app using the following command: -``` +```bash streamlit run app.py
<details> <summary>🧰 Tools</summary> <details> <summary>🪛 markdownlint-cli2 (0.17.2)</summary> 50-50: Fenced code blocks should have a language specified null (MD040, fenced-code-language) </details> </details> </blockquote></details> <details> <summary>text-to-sql-Shuchi/app.py (5)</summary><blockquote> `13-20`: **Consider adding more columns and index for improved queries.** The current table structure is simple, but adding additional columns like `country` and an index on the `state` column would improve query performance for state-based searches. ```diff table_name = "city_stats" city_stats_table = Table( table_name, metadata_obj, Column("city_name", String(16), primary_key=True), Column("population", Integer), Column("state", String(16), nullable=False), + Column("country", String(16), nullable=False, default="USA"), ) +# Create an index on the state column for faster queries +from sqlalchemy import Index +Index(f"ix_{table_name}_state", city_stats_table.c.state)
25-32
: Consider using a separate function for data initialization.For improved code organization and maintainability, consider moving the sample data initialization to a separate function. Also, consider adding more diverse city data.
-# Insert sample data into the table -rows = [ - {"city_name": "New York City", "population": 8336000, "state": "New York"}, - {"city_name": "Los Angeles", "population": 3822000, "state": "California"}, - {"city_name": "Chicago", "population": 2665000, "state": "Illinois"}, - {"city_name": "Houston", "population": 2303000, "state": "Texas"}, - {"city_name": "Miami", "population": 449514, "state": "Florida"}, - {"city_name": "Seattle", "population": 749256, "state": "Washington"}, -] +def initialize_sample_data(): + """Initialize the database with sample city data.""" + rows = [ + {"city_name": "New York City", "population": 8336000, "state": "New York"}, + {"city_name": "Los Angeles", "population": 3822000, "state": "California"}, + {"city_name": "Chicago", "population": 2665000, "state": "Illinois"}, + {"city_name": "Houston", "population": 2303000, "state": "Texas"}, + {"city_name": "Miami", "population": 449514, "state": "Florida"}, + {"city_name": "Seattle", "population": 749256, "state": "Washington"}, + {"city_name": "Austin", "population": 964254, "state": "Texas"}, + {"city_name": "San Francisco", "population": 874961, "state": "California"}, + ] + + for row in rows: + stmt = insert(city_stats_table).values(**row) + with engine.begin() as connection: + connection.execute(stmt) + +# Initialize the database +initialize_sample_data()
80-86
: Make predefined queries more natural language oriented.The current predefined queries are written as natural language questions, but they could be improved to demonstrate the system's ability to handle more complex, conversational queries.
predefined_queries = { - "What are the different states?": "What are the different states?", - "What state is Houston located in?": "What state is Houston located in?", - "Which city has the largest population?": "Which city has the largest population?", - "What is the population of Miami?": "What is the population of Miami?", + "Show me all the states with cities in our database": "What are the different states?", + "I need to know which state Houston is in": "What state is Houston located in?", + "Which is the most populous city in our database?": "Which city has the largest population?", + "Tell me how many people live in Miami": "What is the population of Miami?", + "Show me cities in Texas with populations over 1 million": "Which cities in Texas have a population greater than 1 million?", + "Compare the populations of California cities": "Compare the populations of all cities in California", }
97-97
: Consider providing query transparency for debugging.The commented-out line for displaying the executing query would be useful for debugging and user transparency. Consider making this visible but with a toggle option.
-# st.write(f"**Executing Query:** `{query_prompt}`") + show_query = st.checkbox("Show executed SQL", value=False, key="show_faq_query") + if show_query: + st.write(f"**Natural Language Query:** `{query_prompt}`") + # You could also show the actual SQL query generated by the agent
114-114
: Apply consistent UI pattern for showing queries.Similar to the earlier commented-out line, this should follow the same pattern for consistency. Use the same toggle approach for displaying the query.
- # st.write(f"**Executing Query:** `{custom_query}`") + show_query = st.checkbox("Show executed SQL", value=False, key="show_custom_query") + if show_query: + st.write(f"**Natural Language Query:** `{custom_query}`") + # You could also show the actual SQL query generated by the agent
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
text-to-sql-Shuchi/Flowchart.png
is excluded by!**/*.png
text-to-sql-Shuchi/response_city_explorer-.mp4
is excluded by!**/*.mp4
📒 Files selected for processing (3)
text-to-sql-Shuchi/README.md
(1 hunks)text-to-sql-Shuchi/app.py
(1 hunks)text-to-sql-Shuchi/requirements.txt
(1 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.17.2)
text-to-sql-Shuchi/README.md
25-25: Spaces inside code span elements
null
(MD038, no-space-in-code)
27-27: Fenced code blocks should have a language specified
null
(MD040, fenced-code-language)
38-38: Heading style
Expected: atx; Actual: setext
(MD003, heading-style)
38-38: Headings must start at the beginning of the line
null
(MD023, heading-start-left)
42-42: Fenced code blocks should have a language specified
null
(MD040, fenced-code-language)
50-50: Fenced code blocks should have a language specified
null
(MD040, fenced-code-language)
🪛 Ruff (0.8.2)
text-to-sql-Shuchi/app.py
1-1: os
imported but unused
Remove unused import: os
(F401)
3-3: sqlalchemy.inspect
imported but unused
Remove unused import: sqlalchemy.inspect
(F401)
65-65: Undefined name HfApiModel
(F821)
🔇 Additional comments (1)
text-to-sql-Shuchi/requirements.txt (1)
2-2
:❓ Verification inconclusive
Validate compatibility with smolagents.
The package
smolagents
at version 0.1.0 is relatively new and might have limited compatibility or documentation. Consider verifying that it supports all the features you're using, especially with the specific LiteLLM version.
🏁 Script executed:
#!/bin/bash # Check if smolagents is compatible with LiteLLM pip show smolagents | grep -A 5 "Requires:"Length of output: 85
Manual Verification Required: Confirm Installation of
smolagents
and Its Compatibility with LiteLLMThe automated check using
pip show smolagents
returned a warning that the package wasn’t found. This makes it difficult to infer details about its dependencies or compatibility automatically. Please manually verify that:
smolagents==0.1.0
is correctly installed in your environment.- The installed package (or its documentation) confirms compatibility with the specific features of LiteLLM you are using.
If
smolagents
isn’t installed or the package name is incorrect, adjust the dependency accordingly. Otherwise, consult the official documentation for any potential compatibility caveats.
text-to-sql-Shuchi/requirements.txt
Outdated
streamlit==1.32.0 | ||
smolagents==0.1.0 | ||
sqlalchemy==2.0.25 | ||
litellm==0.1.0 | ||
pillow==10.3.0 # Explicitly specify a compatible version of Pillow and others |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Ensure consistent litellm version.
The litellm version 0.1.0 specified appears to be outdated. The current version on PyPI is much higher (around 1.30.x as of March 2025). This older version might not be compatible with the Gemini Flash 2.0 model mentioned in app.py.
Consider updating to the latest stable version. For example:
-litellm==0.1.0
+litellm==1.30.1
🌐 Web query:
What is the latest version of litellm?
💡 Result:
The latest version of LiteLLM is 1.63.8, released on March 13, 2025[4][7]. This nightly build includes updates such as new model support (e.g., Jamba models), error log visibility in the UI, and fixes for Bedrock Claude's tool calling transformations[3][4].
Key updates in recent releases include:
- Enhanced error handling with new exception types (
APIConnectionError
replacingAPIError
) - Improved streaming response handling for empty chunks
- Token management improvements (
get_max_tokens
now returns integers instead of dictionaries)[2][4]
For Python installations, use:
pip install litellm==1.63.8
The Docker image for this version is available as ghcr.io/berriai/litellm:main-latest
[6][8]. Developers should note the minimum required OpenAI client version is now >=1.0.0
[2][7].
Citations:
- 1: https://www.restack.io/p/litellm-answer-releases-cat-ai
- 2: https://www.restack.io/p/litellm-answer-version-updates-cat-ai
- 3: https://docs.litellm.ai/release_notes
- 4: https://github.com/BerriAI/litellm/releases
- 5: https://docs.litellm.ai/docs/
- 6: https://docs.litellm.ai
- 7: https://pypi.org/project/litellm/
- 8: https://github.com/orgs/berriai/packages/container/litellm/336137937?tag=main-latest
- 9: https://pypi.org/project/llama-index-llms-litellm/
- 10: https://www.litellm.ai
Action: Update litellm version in requirements.txt
- The current requirement (
litellm==0.1.0
) is outdated. - The latest stable version, as of March 2025, is 1.63.8.
- Update the dependency to ensure compatibility with the Gemini Flash 2.0 model in app.py.
Suggested change:
-litellm==0.1.0
+litellm==1.63.8
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
streamlit==1.32.0 | |
smolagents==0.1.0 | |
sqlalchemy==2.0.25 | |
litellm==0.1.0 | |
pillow==10.3.0 # Explicitly specify a compatible version of Pillow and others | |
streamlit==1.32.0 | |
smolagents==0.1.0 | |
sqlalchemy==2.0.25 | |
litellm==1.63.8 | |
pillow==10.3.0 # Explicitly specify a compatible version of Pillow and others |
text-to-sql-Shuchi/app.py
Outdated
@tool | ||
def sql_engine(query: str) -> str: | ||
""" | ||
Executes the generated SQL query on the 'city_stats' table and returns the results. | ||
Args: | ||
query: The query to perform. This should be correct SQL. | ||
Returns: | ||
A string representation of the query results. | ||
""" | ||
output = "" | ||
with engine.connect() as con: | ||
rows = con.execute(text(query)) | ||
for row in rows: | ||
output += "\n" + str(row) | ||
return output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Improve SQL execution function with better error handling and formatting.
The current SQL engine function lacks error handling for SQL syntax errors and proper formatting of results.
@tool
def sql_engine(query: str) -> str:
"""
Executes the generated SQL query on the 'city_stats' table and returns the results.
Args:
query: The query to perform. This should be correct SQL.
Returns:
A string representation of the query results.
"""
- output = ""
- with engine.connect() as con:
- rows = con.execute(text(query))
- for row in rows:
- output += "\n" + str(row)
- return output
+ try:
+ with engine.connect() as con:
+ result = con.execute(text(query))
+ columns = result.keys()
+ rows = result.fetchall()
+
+ if not rows:
+ return "No results found."
+
+ # Format results as a table
+ output = "\n| " + " | ".join(columns) + " |"
+ output += "\n|" + "---|" * len(columns)
+
+ for row in rows:
+ output += "\n| " + " | ".join(str(val) for val in row) + " |"
+
+ return output
+ except Exception as e:
+ return f"Error executing SQL query: {str(e)}"
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
@tool | |
def sql_engine(query: str) -> str: | |
""" | |
Executes the generated SQL query on the 'city_stats' table and returns the results. | |
Args: | |
query: The query to perform. This should be correct SQL. | |
Returns: | |
A string representation of the query results. | |
""" | |
output = "" | |
with engine.connect() as con: | |
rows = con.execute(text(query)) | |
for row in rows: | |
output += "\n" + str(row) | |
return output | |
@tool | |
def sql_engine(query: str) -> str: | |
""" | |
Executes the generated SQL query on the 'city_stats' table and returns the results. | |
Args: | |
query: The query to perform. This should be correct SQL. | |
Returns: | |
A string representation of the query results. | |
""" | |
try: | |
with engine.connect() as con: | |
result = con.execute(text(query)) | |
columns = result.keys() | |
rows = result.fetchall() | |
if not rows: | |
return "No results found." | |
# Format results as a table | |
output = "\n| " + " | ".join(columns) + " |" | |
output += "\n|" + "---|" * len(columns) | |
for row in rows: | |
output += "\n| " + " | ".join(str(val) for val in row) + " |" | |
return output | |
except Exception as e: | |
return f"Error executing SQL query: {str(e)}" |
try: | ||
result = agent.run(custom_query) | ||
st.write("**Query Result:**") | ||
st.write(result) | ||
except Exception as e: | ||
st.error(f"An error occurred: {e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add more detailed error handling.
The current error handling doesn't provide specific feedback for different types of errors that might occur. Consider adding more detailed error handling to provide better guidance to users.
try:
result = agent.run(custom_query)
st.write("**Query Result:**")
st.write(result)
except Exception as e:
- st.error(f"An error occurred: {e}")
+ error_msg = str(e)
+ if "API key" in error_msg.lower():
+ st.error("Error: API key issue. Please check your Gemini API key configuration.")
+ elif "syntax" in error_msg.lower():
+ st.error(f"SQL syntax error: {error_msg}")
+ st.info("Try rephrasing your question in a clearer way.")
+ elif "timeout" in error_msg.lower():
+ st.error("Request timed out. Please try again.")
+ else:
+ st.error(f"An error occurred: {error_msg}")
+ st.info("If this error persists, contact support.")
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
try: | |
result = agent.run(custom_query) | |
st.write("**Query Result:**") | |
st.write(result) | |
except Exception as e: | |
st.error(f"An error occurred: {e}") | |
try: | |
result = agent.run(custom_query) | |
st.write("**Query Result:**") | |
st.write(result) | |
except Exception as e: | |
error_msg = str(e) | |
if "API key" in error_msg.lower(): | |
st.error("Error: API key issue. Please check your Gemini API key configuration.") | |
elif "syntax" in error_msg.lower(): | |
st.error(f"SQL syntax error: {error_msg}") | |
st.info("Try rephrasing your question in a clearer way.") | |
elif "timeout" in error_msg.lower(): | |
st.error("Request timed out. Please try again.") | |
else: | |
st.error(f"An error occurred: {error_msg}") | |
st.info("If this error persists, contact support.") |
else: | ||
model = HfApiModel("Qwen/Qwen2.5-Coder-32B-Instruct") | ||
print("Using Qwen 2.5 Coder") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix undefined HfApiModel import.
The code references HfApiModel
which is not imported. This will cause a runtime error if the model name doesn't start with "Gemini".
import streamlit as st
from sqlalchemy import create_engine, MetaData, Table, Column, String, Integer, text, insert, inspect
-from smolagents import tool, CodeAgent, LiteLLMModel
+from smolagents import tool, CodeAgent, LiteLLMModel, HfApiModel
If HfApiModel
is not available in smolagents, you may need to implement an alternative fallback:
else:
- model = HfApiModel("Qwen/Qwen2.5-Coder-32B-Instruct")
- print("Using Qwen 2.5 Coder")
+ # Fallback to a different model if Gemini is not available
+ try:
+ from smolagents import HfApiModel
+ model = HfApiModel("Qwen/Qwen2.5-Coder-32B-Instruct")
+ print("Using Qwen 2.5 Coder")
+ except ImportError:
+ st.error("HfApiModel not available and Gemini model not selected. Please check your configuration.")
+ model = None
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
else: | |
model = HfApiModel("Qwen/Qwen2.5-Coder-32B-Instruct") | |
print("Using Qwen 2.5 Coder") | |
import streamlit as st | |
from sqlalchemy import create_engine, MetaData, Table, Column, String, Integer, text, insert, inspect | |
-from smolagents import tool, CodeAgent, LiteLLMModel | |
+from smolagents import tool, CodeAgent, LiteLLMModel, HfApiModel | |
# ... (other code parts) | |
else: | |
- model = HfApiModel("Qwen/Qwen2.5-Coder-32B-Instruct") | |
- print("Using Qwen 2.5 Coder") | |
+ # Fallback to a different model if Gemini is not available | |
+ try: | |
+ from smolagents import HfApiModel | |
+ model = HfApiModel("Qwen/Qwen2.5-Coder-32B-Instruct") | |
+ print("Using Qwen 2.5 Coder") | |
+ except ImportError: | |
+ st.error("HfApiModel not available and Gemini model not selected. Please check your configuration.") | |
+ model = None |
🧰 Tools
🪛 Ruff (0.8.2)
65-65: Undefined name HfApiModel
(F821)
import os | ||
import streamlit as st | ||
from sqlalchemy import create_engine, MetaData, Table, Column, String, Integer, text, insert, inspect | ||
from smolagents import tool, CodeAgent, LiteLLMModel | ||
|
||
# Access the API key from Streamlit secrets | ||
GEMINI_API_KEY = st.secrets["GEMINI_API_KEY"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unused import and add environment variable fallback.
The os
module is imported but never used. Also, consider adding a fallback for loading the API key from environment variables when Streamlit secrets are unavailable.
-import os
import streamlit as st
from sqlalchemy import create_engine, MetaData, Table, Column, String, Integer, text, insert, inspect
from smolagents import tool, CodeAgent, LiteLLMModel
+import os
# Access the API key from Streamlit secrets
-GEMINI_API_KEY = st.secrets["GEMINI_API_KEY"]
+try:
+ GEMINI_API_KEY = st.secrets["GEMINI_API_KEY"]
+except Exception:
+ GEMINI_API_KEY = os.environ.get("GEMINI_API_KEY")
+ if not GEMINI_API_KEY:
+ st.error("GEMINI_API_KEY not found in Streamlit secrets or environment variables.")
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
import os | |
import streamlit as st | |
from sqlalchemy import create_engine, MetaData, Table, Column, String, Integer, text, insert, inspect | |
from smolagents import tool, CodeAgent, LiteLLMModel | |
# Access the API key from Streamlit secrets | |
GEMINI_API_KEY = st.secrets["GEMINI_API_KEY"] | |
import streamlit as st | |
from sqlalchemy import create_engine, MetaData, Table, Column, String, Integer, text, insert, inspect | |
from smolagents import tool, CodeAgent, LiteLLMModel | |
import os | |
# Access the API key from Streamlit secrets | |
try: | |
GEMINI_API_KEY = st.secrets["GEMINI_API_KEY"] | |
except Exception: | |
GEMINI_API_KEY = os.environ.get("GEMINI_API_KEY") | |
if not GEMINI_API_KEY: | |
st.error("GEMINI_API_KEY not found in Streamlit secrets or environment variables.") |
🧰 Tools
🪛 Ruff (0.8.2)
1-1: os
imported but unused
Remove unused import: os
(F401)
3-3: sqlalchemy.inspect
imported but unused
Remove unused import: sqlalchemy.inspect
(F401)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (4)
text-to-sql-Shuchi/app.py (4)
34-37
: Optimize database insertion with bulk operations.The current approach of inserting rows one by one is inefficient. Consider using a bulk insert operation for better performance.
-for row in rows: - stmt = insert(city_stats_table).values(**row) - with engine.begin() as connection: - connection.execute(stmt) +# Bulk insert all rows at once +with engine.begin() as connection: + connection.execute(insert(city_stats_table), rows)
110-110
: Remove commented-out debug code.There are commented lines that were likely used for debugging. These should be either properly implemented or removed for cleaner code.
-# st.write(f"**Executing Query:** `{query_prompt}`")
- # st.write(f"**Executing Query:** `{custom_query}`")
Also applies to: 127-127
93-99
: Enhance predefined queries with descriptions.The current predefined queries implementation could be improved by separating the user-friendly descriptions from the actual queries.
# Predefined queries st.subheader("🔍FAQ Queries") predefined_queries = { - "What are the different states?": "What are the different states?", - "What state is Houston located in?": "What state is Houston located in?", - "Which city has the largest population?": "Which city has the largest population?", - "What is the population of Miami?": "What is the population of Miami?", + "List all states": "What are the different states?", + "Find Houston's state": "What state is Houston located in?", + "Largest city by population": "Which city has the largest population?", + "Miami's population": "What is the population of Miami?", }
86-90
: Add sample queries to guide users.The application description could benefit from some examples of queries users can try.
st.title("Welcome to the City Explorer! 🏙️") st.markdown(""" This app is like a smart assistant that helps you find information about cities. You can ask questions about cities, states, and populations. + +**Try asking questions like:** +- "Which cities are in California?" +- "What's the average population of cities in Texas?" +- "Show me cities with population greater than 1 million" """)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
text-to-sql-Shuchi/app.py
(1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
text-to-sql-Shuchi/app.py
1-1: os
imported but unused
Remove unused import: os
(F401)
3-3: sqlalchemy.inspect
imported but unused
Remove unused import: sqlalchemy.inspect
(F401)
78-78: Undefined name HfApiModel
(F821)
🔇 Additional comments (3)
text-to-sql-Shuchi/app.py (3)
1-7
: Remove unused imports and add environment variable fallback for API key.The
os
import is not used, and the API key handling lacks a fallback mechanism for environments where Streamlit secrets might not be available.-import os import streamlit as st -from sqlalchemy import create_engine, MetaData, Table, Column, String, Integer, text, insert, inspect +from sqlalchemy import create_engine, MetaData, Table, Column, String, Integer, text, insert from smolagents import tool, CodeAgent, LiteLLMModel +import os # Access the API key from Streamlit secrets -GEMINI_API_KEY = st.secrets["GEMINI_API_KEY"] +try: + GEMINI_API_KEY = st.secrets["GEMINI_API_KEY"] +except Exception: + GEMINI_API_KEY = os.environ.get("GEMINI_API_KEY") + if not GEMINI_API_KEY: + st.error("GEMINI_API_KEY not found in Streamlit secrets or environment variables.")🧰 Tools
🪛 Ruff (0.8.2)
1-1:
os
imported but unusedRemove unused import:
os
(F401)
3-3:
sqlalchemy.inspect
imported but unusedRemove unused import:
sqlalchemy.inspect
(F401)
77-79
: Fix undefined HfApiModel import.The code references
HfApiModel
which is not imported. This will cause a runtime error if the model name doesn't start with "Gemini".import streamlit as st from sqlalchemy import create_engine, MetaData, Table, Column, String, Integer, text, insert, inspect -from smolagents import tool, CodeAgent, LiteLLMModel +from smolagents import tool, CodeAgent, LiteLLMModel, HfApiModelIf
HfApiModel
is not available in smolagents, you may need to implement an alternative fallback:else: - model = HfApiModel("Qwen/Qwen2.5-Coder-32B-Instruct") - print("Using Qwen 2.5 Coder") + # Fallback to a different model if Gemini is not available + try: + from smolagents import HfApiModel + model = HfApiModel("Qwen/Qwen2.5-Coder-32B-Instruct") + print("Using Qwen 2.5 Coder") + except ImportError: + st.error("HfApiModel not available and Gemini model not selected. Please check your configuration.") + model = None🧰 Tools
🪛 Ruff (0.8.2)
78-78: Undefined name
HfApiModel
(F821)
128-133
: Add more detailed error handling.The current error handling doesn't provide specific feedback for different types of errors that might occur. Consider adding more detailed error handling to provide better guidance to users.
try: result = agent.run(custom_query) st.write("**Query Result:**") st.write(result) except Exception as e: - st.error(f"An error occurred: {e}") + error_msg = str(e) + if "API key" in error_msg.lower(): + st.error("Error: API key issue. Please check your Gemini API key configuration.") + elif "syntax" in error_msg.lower(): + st.error(f"SQL syntax error: {error_msg}") + st.info("Try rephrasing your question in a clearer way.") + elif "timeout" in error_msg.lower(): + st.error("Request timed out. Please try again.") + else: + st.error(f"An error occurred: {error_msg}") + st.info("If this error persists, contact support.")
def sql_engine(query: str) -> str: | ||
""" | ||
Executes the generated SQL query on the 'city_stats' table and returns the results. | ||
Args: | ||
query: The query to perform. This should be correct SQL. | ||
Returns: | ||
A string representation of the query results. | ||
""" | ||
try: | ||
with engine.connect() as con: | ||
result = con.execute(text(query)) | ||
columns = result.keys() | ||
rows = result.fetchall() | ||
|
||
if not rows: | ||
return "No results found." | ||
|
||
# Format results as a table | ||
output = "\n| " + " | ".join(columns) + " |" | ||
output += "\n|" + "---|" * len(columns) | ||
|
||
for row in rows: | ||
output += "\n| " + " | ".join(str(val) for val in row) + " |" | ||
|
||
return output | ||
except Exception as e: | ||
return f"Error executing SQL query: {str(e)}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add SQL injection protection to the sql_engine function.
While the function has good error handling, it doesn't protect against SQL injection attacks when executed with user input.
Since the function is used to execute LLM-generated SQL from user queries, consider adding a validation step:
@tool
def sql_engine(query: str) -> str:
"""
Executes the generated SQL query on the 'city_stats' table and returns the results.
Args:
query: The query to perform. This should be correct SQL.
Returns:
A string representation of the query results.
"""
+ # Validate query to prevent SQL injection
+ allowed_keywords = ["SELECT", "FROM", "WHERE", "ORDER BY", "GROUP BY", "HAVING", "LIMIT", "JOIN", "AND", "OR", "NOT", "LIKE", "IN", "IS"]
+ query_upper = query.upper()
+ if any(keyword not in allowed_keywords and keyword in query_upper for keyword in ["INSERT", "UPDATE", "DELETE", "DROP", "ALTER", "CREATE", "TRUNCATE", "GRANT", "REVOKE", "ATTACH"]):
+ return "Error executing SQL query: Operation not allowed. Only SELECT queries are permitted."
+
try:
with engine.connect() as con:
result = con.execute(text(query))
columns = result.keys()
rows = result.fetchall()
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def sql_engine(query: str) -> str: | |
""" | |
Executes the generated SQL query on the 'city_stats' table and returns the results. | |
Args: | |
query: The query to perform. This should be correct SQL. | |
Returns: | |
A string representation of the query results. | |
""" | |
try: | |
with engine.connect() as con: | |
result = con.execute(text(query)) | |
columns = result.keys() | |
rows = result.fetchall() | |
if not rows: | |
return "No results found." | |
# Format results as a table | |
output = "\n| " + " | ".join(columns) + " |" | |
output += "\n|" + "---|" * len(columns) | |
for row in rows: | |
output += "\n| " + " | ".join(str(val) for val in row) + " |" | |
return output | |
except Exception as e: | |
return f"Error executing SQL query: {str(e)}" | |
def sql_engine(query: str) -> str: | |
""" | |
Executes the generated SQL query on the 'city_stats' table and returns the results. | |
Args: | |
query: The query to perform. This should be correct SQL. | |
Returns: | |
A string representation of the query results. | |
""" | |
# Validate query to prevent SQL injection | |
allowed_keywords = ["SELECT", "FROM", "WHERE", "ORDER BY", "GROUP BY", "HAVING", "LIMIT", "JOIN", "AND", "OR", "NOT", "LIKE", "IN", "IS"] | |
query_upper = query.upper() | |
if any(keyword not in allowed_keywords and keyword in query_upper for keyword in ["INSERT", "UPDATE", "DELETE", "DROP", "ALTER", "CREATE", "TRUNCATE", "GRANT", "REVOKE", "ATTACH"]): | |
return "Error executing SQL query: Operation not allowed. Only SELECT queries are permitted." | |
try: | |
with engine.connect() as con: | |
result = con.execute(text(query)) | |
columns = result.keys() | |
rows = result.fetchall() | |
if not rows: | |
return "No results found." | |
# Format results as a table | |
output = "\n| " + " | ".join(columns) + " |" | |
output += "\n|" + "---|" * len(columns) | |
for row in rows: | |
output += "\n| " + " | ".join(str(val) for val in row) + " |" | |
return output | |
except Exception as e: | |
return f"Error executing SQL query: {str(e)}" |
My Approach:
I implemented the City Explorer app using SQLAlchemy for database management, LiteLLM (Gemini Flash 2.0) for natural language to SQL conversion, and Streamlit for the UI.
This approach focuses on simplicity and efficiency, leveraging lightweight tools to achieve the same task with less code and minimal configuration.
User → LiteLLM → SQL Query → SQLAlchemy → Database → Results → User
Why This Method?:
Due to some configuration challenges with LlamaIndex, Ollama/vLLM, and Qdrant, I opted for a more straightforward approach using SQLAlchemy and LiteLLM.
This allowed me to quickly prototype and deliver a functional app without the overhead of additional setup.
Advantages:
Post content: https://typefully.com/t/HEEq0UT
Streamlit UI: https://www.canva.com/design/DAGhue4_LeI/sEMcO4zqBtOaz_sbFzLniA/watch?utm_content=DAGhue4_LeI&utm_campaign=designshare&utm_medium=link2&utm_source=uniquelinks&utlId=hd7ff32ef2d
Workflow: https://drive.google.com/file/d/1aDHF05V5kE73mf1NyUJ3nln3fxIyY2ql/view?usp=sharing
I'll be happy if I can get feedback on this solution.
I enjoyed to do this task.🚀
Summary by CodeRabbit
New Features
Documentation
requirements.txt
file detailing project dependencies.