Skip to content

theoxfaber/filemind

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

11 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

FileMind

Intelligent content-aware file organizer

Automatically categorizes and organizes files based on actual content, not just file extensions. Understands 45+ file types including PDFs, media, code, documents, and more.

Status Language License Crates.io


What It Does

FileMind analyzes file contents and intelligently organizes them into meaningful categories. Instead of just looking at .pdf or .jpg, it actually reads inside and understands what the file is about.

Example:

Downloads/
β”œβ”€β”€ resume.pdf          β†’ moved to Documents/CVs/
β”œβ”€β”€ vacation.jpg        β†’ moved to Media/Photos/
β”œβ”€β”€ config.yaml         β†’ moved to Dev/Config/
β”œβ”€β”€ notes.md            β†’ moved to Documents/Notes/
└── mystery.bin         β†’ moved to Unknown/ (with analysis)

Why You Need This

Problem: Every developer has a messy Downloads or Documents folder.

Solution: Run FileMind once. Everything organized. No manual work.


Key Features

1. Content-Based Classification

  • Reads file headers and magic bytes
  • Extracts metadata (PDF title, image EXIF, document author)
  • Understands context, not just file type
  • 45+ supported file types

2. 45+ Supported File Types

Documents: PDF, DOCX, PPTX, XLS, TXT, MD, RST
Media: JPG, PNG, GIF, MP4, MP3, WAV, AVI, MOV
Code: PY, JS, TS, RS, GO, C, CPP, JAVA, SQL
Archives: ZIP, RAR, 7Z, TAR, GZ
Dev: JSON, YAML, TOML, ENV, DOCKERFILE, LOCK
Binaries: EXE, DLL, SO, DYLIB
Data: CSV, PARQUET, DB, SQLITE
Other: 20+ more types

3. Keyword Extraction

  • Reads PDF text and finds keywords
  • Analyzes document titles and metadata
  • Scores relevance for smarter categorization

4. Deterministic Organization

  • Batched operations (fast, reliable)
  • Consistent results every time
  • Reproducible folder structure

5. Undo System

  • Every operation is tracked
  • filemind undo reverses last organization
  • Full operation history

6. Audit Mode

  • Preview changes before applying
  • --dry-run shows what would be moved
  • Zero risk testing

7. Size Optimization

  • Size-bucketed organization (< 1MB, < 100MB, > 100MB)
  • Identifies duplicates via hash
  • Cleanup recommendations

Quick Start

Installation

git clone https://github.com/theoxfaber/filemind
cd filemind
cargo build --release

# Or install as command
cargo install --path .

Basic Usage

# Organize Downloads folder
filemind organize ~/Downloads

# Preview first (dry run)
filemind organize ~/Downloads --dry-run

# With verbose output
filemind organize ~/Downloads --verbose

# Create custom structure
filemind organize ~/Documents --config my-structure.json

# Undo last operation
filemind undo

Output

[FileMind] Analyzing 342 files...
[βœ“] 156 documents moved to Documents/
[βœ“] 78 images moved to Media/Photos/
[βœ“] 45 videos moved to Media/Videos/
[βœ“] 52 code files moved to Dev/
[βœ“] 11 archives moved to Archives/

Organization complete!
- Time taken: 2.3s
- Moved: 342 files
- Duplicates found: 12
- Could not classify: 3

Next steps:
  filemind show-duplicates    # See duplicate files
  filemind cleanup            # Remove duplicates

How It Works

1. File Analysis

  • Read file header (first 512 bytes) for magic bytes
  • Extract metadata (title, author, duration, etc.)
  • Determine primary content type

2. Classification

  • Match magic bytes against known signatures
  • Analyze metadata context
  • Apply keyword extraction for accuracy
  • Assign confidence score

3. Organization

  • Create target folder structure
  • Batch move operations for reliability
  • Track all moves (for undo)
  • Report duplicates

4. Optimization

  • Identify duplicate files (same hash)
  • Suggest cleanup actions
  • Size analysis and recommendations

Configuration

Create filemind.json for custom organization:

{
  "root": "/Users/you",
  "structure": {
    "Documents": {
      "CVs": ["pdf"],
      "Receipts": ["pdf"],
      "Notes": ["txt", "md"],
      "Books": ["pdf", "epub"]
    },
    "Media": {
      "Photos": ["jpg", "png", "webp"],
      "Videos": ["mp4", "mkv", "mov"],
      "Audio": ["mp3", "flac", "wav"]
    },
    "Dev": {
      "Config": ["json", "yaml", "toml"],
      "Code": ["py", "rs", "js", "go"],
      "SQL": ["sql", "db"]
    },
    "Archives": ["zip", "7z", "rar"],
    "Unknown": []
  },
  "rules": {
    "min_file_size_kb": 10,
    "follow_symlinks": false,
    "ignore_hidden": true,
    "batch_size": 100
  }
}

Then run:

filemind organize . --config filemind.json

Advanced Features

Find Duplicates

filemind find-duplicates ~/Downloads

# Output:
# Hash: a1b2c3d4e5f6...
#   1. report.pdf (2.3 MB)
#   2. report_final.pdf (2.3 MB)
#   3. report_final_FINAL.pdf (2.3 MB)

Cleanup Duplicates

filemind cleanup-duplicates ~/Downloads

# Interactively choose which to keep
# Others are moved to Trash/

Size Analysis

filemind analyze-sizes ~/Downloads

# Output:
# Total size: 125 GB
# Largest files:
#   Video.mp4: 45 GB
#   Archive.zip: 32 GB
#   Database.db: 28 GB
# 
# Duplicates: 12 GB could be freed
# Unused: 3.2 GB (not accessed in 6 months)

Audit Trail

filemind history

# Shows all operations with timestamps
# Use for recovering from accidental moves

Performance

  • Latency: 100-200 files/second
  • Memory: ~50MB baseline + file count
  • Disk I/O: Sequential reads, minimal writes
  • Batching: 100-file batches for reliability

Example: 10,000 files

Time: ~60 seconds
Memory: ~150MB
Success rate: 99.8%
Errors: 17 (permission denied, etc.)

Project Structure

filemind/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ main.rs              # CLI entry point
β”‚   β”œβ”€β”€ classifier.rs        # File classification engine
β”‚   β”œβ”€β”€ magic.rs             # Magic byte matching
β”‚   β”œβ”€β”€ metadata.rs          # Metadata extraction
β”‚   β”œβ”€β”€ organizer.rs         # Move operations
β”‚   β”œβ”€β”€ audit.rs             # History tracking
β”‚   β”œβ”€β”€ dedup.rs             # Duplicate detection
β”‚   └── types.rs             # Data structures
β”œβ”€β”€ tests/                   # Integration tests
β”œβ”€β”€ Cargo.toml
└── README.md

Installation Methods

From Source

git clone https://github.com/theoxfaber/filemind
cd filemind
cargo install --path .

From Cargo

cargo install filemind

From Release Binary

Download from Releases


Testing

# Unit tests
cargo test

# Integration tests (actual file operations)
cargo test -- --include-ignored

# Benchmark
cargo bench

Known Limitations

  • Symlinks: Currently skips symbolic links (can be enabled)
  • Network drives: Performance limited by network speed
  • Large files: Files > 5GB analysis may be slow
  • Special chars: Some filesystem characters may cause issues

Future Roadmap

  • Machine learning-based classification
  • Watch mode (auto-organize on new files)
  • Cloud sync integration
  • GUI application
  • Network share support

Troubleshooting

"Permission Denied" Errors

# Check permissions
ls -la ~/Downloads

# Run with sudo if needed (careful!)
sudo filemind organize ~/Protected

Files Not Moving

# Use --verbose to see why
filemind organize . --verbose

# Check audit log
filemind history

Undo Failed

# View undo history
filemind history --limit 10

# Manual recovery (files in FileMind/Trash)
find . -path "*FileMind/Trash*" -type f

Contributing

Contributions welcome:

  1. Fork the repo
  2. Create feature branch
  3. Add tests
  4. Submit PR

License

MIT License β€” see LICENSE


Get In Touch

πŸ’¬ Bug report? Open an issue
πŸ’Ό Want to use FileMind professionally? Available for consulting
πŸ“§ Questions? DM me


Built with Rust | Fast. Reliable. Open Source.

⭐ If this saved you time, star the repo!

About

🧠 Deterministic, content-aware file organizer, zero AI, zero network, single Rust binary. 3-tier classifier with explainable confidence scores, full undo, and TOML config.

Topics

Resources

License

Stars

Watchers

Forks

Contributors

Languages