Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FormAuthInfo Authentication Error #291

Closed
jahenkor opened this issue Feb 23, 2018 · 2 comments
Closed

FormAuthInfo Authentication Error #291

jahenkor opened this issue Feb 23, 2018 · 2 comments
Assignees

Comments

@jahenkor
Copy link

FormAuthInfo seems to successfully login to X website, for any username (Valid or Invalid) as input. Even with a valid username and password, it only crawls the login page.

16:09:55.821 [main] INFO edu.uci.ics.crawler4j.fetcher.PageFetcher - FORM authentication for: /login 16:09:56.286 [main] DEBUG edu.uci.ics.crawler4j.fetcher.PageFetcher - Successfully Logged in with user: 1juliusahenkora@gmail.com to: www.phptravels.net 16:09:56.297 [main] INFO edu.uci.ics.crawler4j.url.TLDList - Obtained 6791 TLD from packaged file tld-names.txt 16:09:56.331 [main] DEBUG edu.uci.ics.crawler4j.util.IO - Deleting content of: /Users/juliusahenkora/Documents/NTestWebCrawlV1.0/data/crawl/root/frontier 16:09:56.332 [main] INFO edu.uci.ics.crawler4j.crawler.CrawlController - Deleted contents of: data/crawl/root/frontier ( as you have configured resumable crawling to false ) 16:09:57.180 [main] DEBUG edu.uci.ics.crawler4j.robotstxt.RobotstxtServer - Can't read this robots.txt: http://www.phptravels.net/robots.txt as it's status code is 404 http://www.phptravels.net/login http://www.phptravels.net/login/fr http://www.phptravels.net/login/ru http://www.phptravels.net/login/en http://www.phptravels.net/login/es http://www.phptravels.net/login/ar http://www.phptravels.net/login/tr

@ravi-katiyar
Copy link

you might want to look inside the library methods of crawler4j this Log print is deceptive :

Successfully Logged in with user: 1juliusahenkora@gmail.com to: www.phptravels.net

don't go by this , check what is the response code for the login page api being crawled

@s17t
Copy link
Contributor

s17t commented Mar 12, 2018

Indeed, the log line is deceptive.

@s17t s17t self-assigned this Mar 12, 2018
@s17t s17t closed this as completed Mar 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants