Skip to content

v0.3.0 - Improving detection on the "tough but not too tough" sites

Choose a tag to compare

@ZA1815 ZA1815 released this 24 Oct 23:22
· 13 commits to main since this release

🔍 caniscrape v0.3.0 Release Notes

This release takes caniscrape to the next level with advanced fingerprinting detection. While v0.2.0 focused on automation (proxies and CAPTCHA solving), v0.3.0 digs deeper into understanding how sites are detecting bots—giving you the intel you need to build truly stealthy scrapers.


🔬 What's New

Feature: Advanced Fingerprinting Detection

A brand new analyzer that detects enterprise-grade bot detection services running on the client side.

What it detects:

  • Known Bot Detection Services: Identifies scripts from PerimeterX (HUMAN), DataDome, Akamai Bot Manager, Cloudflare Bot Management, Imperva, Kasada, Shape Security, CHEQ, and Radware
  • Canvas Fingerprinting Signals: Detects when canvas APIs have been modified (strong indicator of fingerprinting)
  • Behavioral Tracking: Monitors which user events the site is listening to (mousemove, scroll, keydown, etc.)

Why this matters:
These services operate in the browser and are nearly invisible to traditional detection tools. They track mouse movements, measure timing inconsistencies, and fingerprint your browser's canvas rendering. If you're being blocked and don't know why, this analyzer will tell you.

Feature: Browser Integrity Analysis

A forensic-level check that compares critical browser functions on the target site against a clean baseline.

What it checks:

  • Canvas API tampering (strong fingerprinting indicator)
  • Plugin/MimeType spoofing (headless browser evasion)
  • Network function hooks (fetch/XMLHttpRequest monitoring)
  • Timing function alterations (performance/Date API)
  • Anti-debugging techniques (console modifications)

Why this matters:
Sites can inject JavaScript to modify or wrap native browser functions. If your scraper's canvas rendering is slightly off, this analyzer catches these modifications and explains what they indicate.


📊 Scoring Updates

The difficulty scoring system now accounts for advanced client-side protections:

Detection Impact New in v0.3.0
Known bot detection service detected +2 points
Canvas fingerprinting signal +1 point
Browser function modifications +1 point
CAPTCHA on page load +5 points
CAPTCHA after rate limit +4 points
DataDome/PerimeterX WAF +4 points
Akamai/Imperva WAF +3 points
Aggressive rate limiting +3 points
Cloudflare WAF +2 points
Honeypot traps detected +2 points
TLS fingerprinting active +1 point

Score interpretation remains the same:

  • 0-2: Easy (basic scraping will work)
  • 3-4: Medium (need some precautions)
  • 5-7: Hard (requires advanced techniques)
  • 8-10: Very Hard (consider using a service)

💡 Recommendation Updates

The recommendation engine now provides guidance when advanced protections are detected:

New tool recommendations:

  • Suggests undetected-chromedriver or playwright-stealth when behavioral tracking is detected
  • Recommends specific evasion techniques when browser functions are modified

New strategy tips:

  • Warns about canvas fingerprinting and suggests mitigation
  • Advises on handling sites with behavioral tracking (slow, human-like movements)
  • Explains implications of detected bot detection services

🛠️ How to Use the New Features

The new analyzers run automatically—no new flags required:

caniscrape https://example.com

Example output for a heavily protected site:

🛡️  ACTIVE PROTECTIONS

    ❌ Advanced Bot Detection:
       - Known Services Found: PerimeterX (HUMAN), DataDome
    
    ⚠️  Suspicious Signals:
       - Canvas Fingerprinting Suspected (canvas function is not native)
       - Behavioral Tracking Suspected (listeners found for: mousemove, scroll, keydown)
    
    ❌ Browser Integrity Compromised:
       - Function "navigator.webdriver" was modified.
         Reason: Indicator of Headless Browser Evasion.
       - Function "HTMLCanvasElement.prototype.toDataURL" was modified.
         Reason: Strong indicator of Canvas Fingerprinting.

🧪 Known Limitations

  • False Negatives: Some sites use obfuscated or custom protection systems that may not be detected
  • Detection Arms Race: Bot detection evolves constantly—we'll keep updating signatures

What's Next (v0.4.0 Preview)

The next release will focus on improving detection for tough sites:

  • Enhanced analysis for Amazon, YouTube, and other major platforms
  • Better handling of sites with multiple protection layers
  • Improved CAPTCHA detection accuracy
  • More granular scoring for different types of protections

📬 Feedback & Bug Reports

Found a site where detection isn't working? Encountered a crash? Have suggestions for improvement?

Open an issue on GitHub: https://github.com/ZA1815/caniscrape/issues