v2.0.0-beta

Pre-release

Pre-release

z7r1k3 released this 12 May 01:38

Feature Add

Added unique logging, the new default logging option which abandons the tree format and simply logs a list of each discovered URL once, and only once. Standard and redundant logging are still available upon user input, which have the usual tree format.

Bugfix

Redundantly logged URLs are now fully accurate in both the logs and terminal output. If you see a URL in the log/output tree without any child URLs, it doesn't have any or wasn't crawled (unless the crawler hit the depth limit, of course).

Changes to Logging and Terminal Output

Standard logging now omits the parent URL if it is being skipped due to previously crawling it. This makes the log/output more streamlined and less confusing.
Standard output now prints all URLs, but does not print any Error/Info messages (they are still logged).

Improvements to User Input

All input options now individually check for valid input. This means if you mess up one option, you won't have to re-input all of them.
Added more defaults, allowing the user to just mash enter after inputting the URL(s) to crawl.

Decreased Timeout

The timeout for opening a URL has been decreased to 20 seconds. If the crawler is hanging on a specific URL, this forces it to move on sooner.

Added Caching

The crawler now caches prefixes. This results in a more streamlined debug log, rather than having it spam "No prefix detected".

Refactored Default Variables

All default options are now placed at the top of the file. This allows the user to change options that are not requested during runtime, such as the log file location or timeout.

Assets 2