Skip to content

thisManHere/ai-dataquality-copilot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🔬 AI Data Quality Copilot

This is a data engineering project that automatically detects and explains data quality issues I built using Python, Pandas, SQL, and Claude AI.


ai_dataquality_copilot/ ├── streamlit_app.py # UI + app logic (replaces Flask + HTML + CSS) ├── data_quality.py # Core DQ checks (Pandas + SQLite) ├── llm_engine.py # Claude AI integration ├── requirements.txt # Python dependencies ├── .env # Your API key (never share or commit this) ├── .gitignore # Keeps .env out of GitHub └── README.md


step 1: pip install -r requirements.txt

step 2: Create a '.env' file in the project folder:

step 3: ANTHROPIC_API_KEY=sk-ant-your-key-here

No key? The app still works using rule-based analysis — just without AI insights.

step 4: Run the app streamlit run streamlit_app.py

Your browser opens automatically at http://localhost:8501


What It Does

Check Method
Null values Pandas .isnull()
Duplicate rows Pandas .duplicated()
Statistical outliers IQR method
Data type mismatches Pandas to_numeric, to_datetime
Inconsistent casing Pandas string methods
Invalid email formats Python regex
Live SQL queries SQLite in-memory
AI fix recommendations Claude API

streamlit_app.py ├── calls → data_quality.py → run_full_profile(df) └── calls → llm_engine.py → analyze_with_llm(report)

'data_quality.py' and 'llm_engine.py' are Python functions that take data in and return results. They have no knowledge of Streamlit. Meaning you could swap the UI for anything else and the logic stays the same.

  • Export quality report as PDF
  • Connect to PostgreSQL or MySQL instead of SQLite
  • Track quality scores over time
  • Add schema validation with Great Expectations
  • Schedule automated checks with Airflow or cron

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors