A powerful, user-friendly tool for cleaning messy CSV files, accessible via a modern web interface and a powerful command-line interface (CLI). It supports both traditional cleaning operations and advanced transformations, making data cleaning faster and more intuitive.
The primary way to use the Smart CSV Cleaner is through its intuitive web interface.
- Upload & Analyze: Simply upload your CSV file to get an instant analysis, including:
- Row and column counts
- A preview of the first 10 rows
- A count of missing values
- Core Cleaning Operations:
- Fix Column Names: Standardizes column headers (strips extra spaces, removes special characters, and ensures uniqueness).
- Handle Missing Data: Choose a strategy (mean, median, mode, or drop rows) to handle empty cells.
- Remove Outliers: Automatically detect and remove outliers using Z-score or IQR methods.
- Advanced Cleaning Operations:
- Remove Duplicates: Delete duplicate rows from your dataset.
- Trim Whitespace: Remove leading/trailing whitespace from all cells.
- Change Text Case: Convert text to
UPPERCASE,lowercase, orTitle Case. - Find and Replace: Replace specific values throughout the dataset.
- Drop Columns: Select and remove unwanted columns.
- Post-Cleaning Analysis: After cleaning, see a "Final File Analysis" with updated stats and a preview of the cleaned data.
- Export Options: Download the cleaned data as a
.csvor.xlsxfile.
For users who prefer the command line or need to automate cleaning tasks.
- Dual Input Support:
- Traditional flags:
--fix-names,--fix-missing mean,--drop-outliers zscore - Natural language commands:
"fix column names and remove outliers"
- Traditional flags:
- Comprehensive Operations: All core cleaning operations from the UI are available via the CLI.
- Detailed Logging: Every transformation is timestamped and logged for full traceability.
- Python 3.7+
- pandas
- click
- tabulate
- openpyxl
- numpy
- Flask
pip install pandas click tabulate openpyxl numpy flask- Start the Flask server:
python app.py
- Open your web browser and navigate to
http://127.0.0.1:5000.
Run the tool from your terminal with the desired flags:
python main.py --input-file your_data.csv --fix-names --fix-missing median --output-file cleaned_data.csvOr with a natural language command:
python main.py --input-file your_data.csv --nlp-command "fix names and fill missing values with the median"