v1.0.0 - Official release, the start of a growing database
π caniscrape v1.0.0 Release Notes
The first major release! v1.0.0 marks caniscrape's transition from a standalone CLI tool to a complete cloud-connected platform for tracking website protections over time.
π What's New in v1.0.0
1. Cloud Integration βοΈ
Connect your local CLI to caniscrape Cloud for persistent scan history, team collaboration, and protection change tracking.
Key features:
- Project Management: Create projects to organize scans by purpose (e.g., "E-commerce Scraper", "News Aggregator")
- Automatic Sync: Enable auto-upload to push every scan to the cloud instantly
- Scan History: Track how site protections change over time
- Smart Diffing: Automatically compare new scans against previous ones to detect protection changes
- Offline Support: Scans cache locally when offline, push them later with caniscrape push
New commands:
caniscrape init # Link to a cloud project
caniscrape link # Connect to an existing project
caniscrape push # Upload cached scans
caniscrape config set # Configure auto-upload settings
Example workflow:
# One-time setup
caniscrape init
# Every scan now automatically syncs to cloud
caniscrape scan https://example.com
# View history at https://caniscrape.org/projects
2. Privacy-First Telemetry π
Two separate, opt-in telemetry systems to improve caniscrape.
Usage Telemetry (anonymous):
- CLI version, Python version, OS type
- Commands used and success/failure rates
- Error types (no URLs or personal data)
- Completely anonymous with device ID only
Public Scan Database (like Shodan for anti-bot defenses):
- Opt-in contribution of scan results to a searchable public database
- See how site protections change over time across all users
- Compare different sites' protection strategies
- Currently free while building the database
Full control:
caniscrape telemetry usage on/off # Toggle usage telemetry
caniscrape telemetry scans on/off # Toggle scan contributions
caniscrape telemetry delete # GDPR data deletion
caniscrape telemetry status # View current settings
What we DON'T collect:
- Your name, email, or IP address (usage telemetry)
- Authentication tokens or credentials (scan telemetry)
- Any personally identifiable information
3. Scan Comparison & Change Detection π
Automatically detect when site protections change between scans:
Detected changes:
- Difficulty score increases/decreases
- New protections added (WAF, CAPTCHA, fingerprinting)
- Protections removed or disabled
- Status changes for existing protections
Example output:
π Changes Since Last Scan (2025-10-15 14:30)
β οΈ Difficulty Score: +3 points (site got harder to scrape)
β οΈ New Protections Detected:
+ WAF: DataDome
+ Canvas Fingerprinting
β
Protections Removed:
- CAPTCHA: reCAPTCHA v2
4. Improved CLI Structure β‘
The CLI has been restructured for better organization and extensibility:
New command structure:
caniscrape scan <url> # Analyze a website (replaces direct URL)
caniscrape init # Initialize cloud project
caniscrape link # Link to existing project
caniscrape push # Push cached scans
caniscrape config set/show # Manage configuration
caniscrape telemetry # Manage telemetry settings
Backward compatibility note:
# Doesn't work
caniscrape <url>
# Works (new syntax)
caniscrape scan <url>
5. Improved Error Handling & UX β¨
- Better error messages with actionable guidance
- Clear prompts for authentication and setup
- Informative status messages during long operations
- Graceful handling of network failures and timeouts
- Improved progress indicators for multi-step operations
6. Configuration Management βοΈ
Fine-grained control over CLI behavior:
# Enable/disable auto-upload
caniscrape config set auto-upload on
caniscrape config set auto-upload off
# View current configuration
caniscrape config show
Configuration hierarchy:
- Searches parent directories for .caniscrape/config
- Allows different projects in subdirectories
- Works like git's configuration system
π§ Technical Improvements
API Client
- Robust error handling with retry logic
- Rate limit detection and user-friendly messages
- Token expiration handling with re-authentication prompts
- Proper timeout management
Caching System
- Local scan results cache in .caniscrape/cache/
- Automatic cache cleanup after successful push
- Metadata tracking (timestamp, CLI version, URL)
- Works as fallback when offline or rate-limited
Diff Engine
- Intelligent comparison of scan results
- Handles schema changes between versions
- Filters out noise (e.g., duplicate Cloudflare detections)
- Clear visualization of changes
π Updated Scoring System
The difficulty scoring from v0.3.0 remains unchanged, but now integrates with cloud tracking:
| Detection | Impact |
|---|---|
| Known bot detection service detected | +2 points |
| Canvas fingerprinting signal | +1 point |
| Browser function modifications | +1 point |
| CAPTCHA on page load | +5 points |
| CAPTCHA after rate limit | +4 points |
| DataDome/PerimeterX WAF | +4 points |
| Akamai/Imperva WAF | +3 points |
| Aggressive rate limiting | +3 points |
| Cloudflare WAF | +2 points |
| Honeypot traps detected | +2 points |
| TLS fingerprinting active | +1 point |
Score interpretation remains the same:
- 0-2: Easy (basic scraping will work)
- 3-4: Medium (need some precautions)
- 5-7: Hard (requires advanced techniques)
- 8-10: Very Hard (consider using a service)Score interpretation:
π New Use Cases
For Consultants
- Historical data: Show clients how site protections evolved
- Professional presentation: Cloud dashboard looks more professional than CLI output
For Long-Term Monitoring
- Protection tracking: See when sites add/remove defenses
- Seasonal patterns: Identify when sites tighten security (e.g., Black Friday)
- Regression detection: Get alerted when sites become easier to scrape
π§ Breaking Changes
Command Structure
Old (v0.3.0):
caniscrape https://example.com
New (v1.0.0):
caniscrape scan https://example.com
π Coming in v1.1.0
Scheduled Scans: Automatic re-scanning on a schedule
π Bug Fixes
- Fixed double-counting of Cloudflare in both WAF and fingerprinting detection
- Improved proxy handling in CAPTCHA solver integration
- Better error messages when wafw00f is missing
- Fixed edge cases in diff engine when comparing old scan formats
- Corrected behavioral detector link counting logic
π Acknowledgments
- Community feedback: Thank you to everyone who tested v0.3.0
- Dependencies: Built on wafw00f, Playwright, curl_cffi, and other amazing open-source projects
π¬ Feedback & Support
GitHub Issues: https://github.com/ZA1815/caniscrape/issues
Documentation: https://docs.caniscrape.org (coming soon)
Cloud Dashboard: https://caniscrape.org