Releases: z7r1k3/creeper
v2.0.0-beta
Feature Add
- Added unique logging, the new default logging option which abandons the tree format and simply logs a list of each discovered URL once, and only once. Standard and redundant logging are still available upon user input, which have the usual tree format.
Bugfix
- Redundantly logged URLs are now fully accurate in both the logs and terminal output. If you see a URL in the log/output tree without any child URLs, it doesn't have any or wasn't crawled (unless the crawler hit the depth limit, of course).
Changes to Logging and Terminal Output
- Standard logging now omits the parent URL if it is being skipped due to previously crawling it. This makes the log/output more streamlined and less confusing.
- Standard output now prints all URLs, but does not print any Error/Info messages (they are still logged).
Improvements to User Input
- All input options now individually check for valid input. This means if you mess up one option, you won't have to re-input all of them.
- Added more defaults, allowing the user to just mash enter after inputting the URL(s) to crawl.
Decreased Timeout
- The timeout for opening a URL has been decreased to 20 seconds. If the crawler is hanging on a specific URL, this forces it to move on sooner.
Added Caching
- The crawler now caches prefixes. This results in a more streamlined debug log, rather than having it spam "No prefix detected".
Refactored Default Variables
- All default options are now placed at the top of the file. This allows the user to change options that are not requested during runtime, such as the log file location or timeout.
v1.4.2-beta
Improved Logging
This pre-release brings some small but significant improvements to logging. Debug logs contain more information, and URL logs have a more accurate structure.
v1.4.1-beta
Code Refactor
Code now meets typical python style standards.
v1.4.0-beta
Debug Logging
The new debug log is much more robust, with critical non-error information that will help solve issues in the future.
Code Refactoring
The code as a whole has been refactored further, with a primary focus on making it easy to add and modify log entries.
Prompt Defaults
The program now features a default selection which, if user input is empty, will automatically be used.
Beta
This project is still in beta. Updating all tags to reflect that.
v1.3.0-beta
Feature Implementations
Added new option that allows the user to disable redundant logging for URLs that were already crawled and logged. Doing so will speed up overall crawl, as writing to a .txt file and the console output takes time.
User input aside from the URL is now checked and, if incorrect type, prompts for the correct input as opposed to just throwing exceptions.
v1.2.0-beta
Error Logging
This release introduces proper error logging. Now, whenever an error occurs it is logged to the .error folder, along with the full exception thrown if applicable. Errors shown in the program output have a unique code that can be used to lookup the error in the applicable log. Errors will always log in a file, regardless of user settings.
Code Refactoring
Further progress has been made to make the code more readable and efficient. While it most definitely isn't perfect, it seems satisfactory for the moment.
v1.1.0-beta
Various Bugfixes
URLs are now handled much more accurately, meaning more discovered URLs will be eligible for crawling.
Other previously unknown bugs have been fixed as well.
Feature Improvements
Tag, attribute, and file ending lists have been expanded to account for more links that were previously being ignored.
Code Refactoring
Major improvements to the code structure. This is still a work in progress, but it is vastly better than before. If you value your eyes, I recommend avoiding the previous commits.
v1.0.6-beta
Completely whitelists the original URL from any qualifying checks. This will be used as a failsafe if, for example, a link is incorrectly not crawled because it is detected as an unqualified file-type. The user can then take the URL of that and crawl it separately. Assuming it is a valid HTML or FTP page, it will crawl it properly.
Fully "production ready" release will include a detailed README.md, but the crawler itself is production ready.