You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
FormAuthInfo seems to successfully login to X website, for any username (Valid or Invalid) as input. Even with a valid username and password, it only crawls the login page.
16:09:55.821 [main] INFO edu.uci.ics.crawler4j.fetcher.PageFetcher - FORM authentication for: /login 16:09:56.286 [main] DEBUG edu.uci.ics.crawler4j.fetcher.PageFetcher - Successfully Logged in with user: 1juliusahenkora@gmail.com to: www.phptravels.net 16:09:56.297 [main] INFO edu.uci.ics.crawler4j.url.TLDList - Obtained 6791 TLD from packaged file tld-names.txt 16:09:56.331 [main] DEBUG edu.uci.ics.crawler4j.util.IO - Deleting content of: /Users/juliusahenkora/Documents/NTestWebCrawlV1.0/data/crawl/root/frontier 16:09:56.332 [main] INFO edu.uci.ics.crawler4j.crawler.CrawlController - Deleted contents of: data/crawl/root/frontier ( as you have configured resumable crawling to false ) 16:09:57.180 [main] DEBUG edu.uci.ics.crawler4j.robotstxt.RobotstxtServer - Can't read this robots.txt: http://www.phptravels.net/robots.txt as it's status code is 404 http://www.phptravels.net/login http://www.phptravels.net/login/fr http://www.phptravels.net/login/ru http://www.phptravels.net/login/en http://www.phptravels.net/login/es http://www.phptravels.net/login/ar http://www.phptravels.net/login/tr
The text was updated successfully, but these errors were encountered:
FormAuthInfo seems to successfully login to X website, for any username (Valid or Invalid) as input. Even with a valid username and password, it only crawls the login page.
16:09:55.821 [main] INFO edu.uci.ics.crawler4j.fetcher.PageFetcher - FORM authentication for: /login 16:09:56.286 [main] DEBUG edu.uci.ics.crawler4j.fetcher.PageFetcher - Successfully Logged in with user: 1juliusahenkora@gmail.com to: www.phptravels.net 16:09:56.297 [main] INFO edu.uci.ics.crawler4j.url.TLDList - Obtained 6791 TLD from packaged file tld-names.txt 16:09:56.331 [main] DEBUG edu.uci.ics.crawler4j.util.IO - Deleting content of: /Users/juliusahenkora/Documents/NTestWebCrawlV1.0/data/crawl/root/frontier 16:09:56.332 [main] INFO edu.uci.ics.crawler4j.crawler.CrawlController - Deleted contents of: data/crawl/root/frontier ( as you have configured resumable crawling to false ) 16:09:57.180 [main] DEBUG edu.uci.ics.crawler4j.robotstxt.RobotstxtServer - Can't read this robots.txt: http://www.phptravels.net/robots.txt as it's status code is 404 http://www.phptravels.net/login http://www.phptravels.net/login/fr http://www.phptravels.net/login/ru http://www.phptravels.net/login/en http://www.phptravels.net/login/es http://www.phptravels.net/login/ar http://www.phptravels.net/login/tr
The text was updated successfully, but these errors were encountered: