Crawler Finished Time added to done list and improvements #45

N0taN3rd · 2019-04-24T21:37:10Z

implemented adding the time a crawler finishes fixes #44
implemented the ability to specify screen shot dimensions via the SCREENSHOT_DIMENSIONS environment key
improved the handling of out link collection in order to handle pages with 1k+ links present in the dom at a time
switch to using ABCMeta and EventEmitterS in order to ensure we are not forcing a dict on classes that do not opt in for one
unified slots format and removed un-necessary ABC usage in intermediate abstract classes
updated README with new environment variables

implemented the ability to specify screen shot dimensions via the SCREENSHOT_DIMENSIONS environment key improved the handling of out link collection in order to handle pages with 1k+ links present in the dom at a time switch to using ABCMeta and EventEmitterS in order to ensure we are not forcing a __dict__ on classes that do not opt in for one unified slots format and removed un-necessary ABC usage in intermediate abstract classes updated README with new environment variables

…ky iframe/frame usage

…order to be more clear about what it does when all_frames is true, previously manual_collection, keyword arg in collect_outlinks is true, both all frame and behavior out link collection occurs rather than one or the other added typing to BehaviorTabs __slots__ to make linting happy

ikreymer · 2019-04-26T02:19:16Z

autobrowser/tabs/crawlerTab.py

@@ -165,7 +197,8 @@ def main_frame_getter(self) -> Frame:
                    exc_info=e,
                )

-        self.logger.info(logged_method, "crawl loop task ended")
+        end_info = Helper.json_string(id=self.reqid, time=time.time())


Suggested change

end_info = Helper.json_string(id=self.reqid, time=time.time())

end_info = Helper.json_string(id=self.reqid, time=int(time.time()))

Otherwise includes a long decimal for microseconds, don't really need that :)

N0taN3rd requested a review from ikreymer April 24, 2019 21:37

N0taN3rd added 2 commits April 24, 2019 18:31

re-added manual out link collection in-order to handle pages with fun…

706b44d

…ky iframe/frame usage

ikreymer reviewed Apr 26, 2019

View reviewed changes

convert the float returned by time.time() to an int per review comment

d105aa3

N0taN3rd requested a review from ikreymer April 26, 2019 15:07

ikreymer merged commit 8d39df4 into master Apr 26, 2019

ikreymer deleted the crawler-done-timing-and-tweaks branch April 26, 2019 20:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crawler Finished Time added to done list and improvements #45

Crawler Finished Time added to done list and improvements #45

N0taN3rd commented Apr 24, 2019

ikreymer Apr 26, 2019

N0taN3rd Apr 26, 2019

	end_info = Helper.json_string(id=self.reqid, time=time.time())
	end_info = Helper.json_string(id=self.reqid, time=int(time.time()))

Crawler Finished Time added to done list and improvements #45

Crawler Finished Time added to done list and improvements #45

Conversation

N0taN3rd commented Apr 24, 2019

ikreymer Apr 26, 2019

Choose a reason for hiding this comment

N0taN3rd Apr 26, 2019

Choose a reason for hiding this comment