Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Factory instead of hardcoded class.newInstance() #84

Closed
rzo1 opened this issue Jul 20, 2015 · 8 comments
Closed

Factory instead of hardcoded class.newInstance() #84

rzo1 opened this issue Jul 20, 2015 · 8 comments

Comments

@rzo1
Copy link
Contributor

rzo1 commented Jul 20, 2015

I would like to suggest, that adding a the possibility to use a factory to create new web-crawlers would be of great value.

Since a web-crawler could hold a few custom services (e.g. classifiers, database services) a factory would be a very nice thing to make crawler4j usable for example via Spring.

A few years ago an issue was created on googlecode (https://code.google.com/p/crawler4j/issues/detail?id=144), which is a duplicate of mine request - but nothing happend. Is there a reason for not including a factory approach in the code-base?

Thanks in advance.

@yasserg
Copy link
Owner

yasserg commented Jul 20, 2015

It is already implemented but not well documented yet. See WebCrawlerFactory in https://github.com/yasserg/crawler4j/blob/master/src/main/java/edu/uci/ics/crawler4j/crawler/CrawlController.java

@rzo1
Copy link
Contributor Author

rzo1 commented Jul 20, 2015

A i see - thanks for your answer - but it is not included in the current release of crawler4j?
So i probably have to use the SNAPSHOT, if i like to use this feature?

@rzo1 rzo1 closed this as completed Jul 20, 2015
@yasserg
Copy link
Owner

yasserg commented Jul 20, 2015

Yes

@s17t
Copy link
Contributor

s17t commented Nov 20, 2015

+1 for factory or any way that allow dependency injection. Any scheduled for new stable that includes this feature ?

@rzo1
Copy link
Contributor Author

rzo1 commented Nov 20, 2015

@s17t Just get the source code and build the current SNAPSHOT by yourself. The snap implements a WebCrawlerFactory, which you can use for DI e.g. with Spring.

@epubreader
Copy link

I know you have added WebCrawlerFactory, but How to use spring DI? can you give an example? thanks

@s17t
Copy link
Contributor

s17t commented May 11, 2016

@eimhee, something like this:

public class CrawlerCrawlerControllerFactory implements CrawlController.WebCrawlerFactory {

   public CrawlerCrawlerControllerFactory(...) {
    ...
   }

    @Override
    public WebCrawler newInstance() {
        return new edu.uci.ics.crawler4j.crawler.WebCrawler(...) // Or new instance of your WebCrawler children class
}

Then in spring:

@Bean
public CrawlController crawler() {

     CrawlerCrawlerControllerFactory factory = new CrawlerCrawlerControllerFactory(...);

    // Configure CrawController accordingly...
    CrawlController controller = new CrawlController(config, pageFetcher, robotstxtServer);
    controller.startNonBlocking(factory, numberOfCrawlers);

   return controller 
}

@rzo1
Copy link
Contributor Author

rzo1 commented May 11, 2016

As an alternative, you could implement ApplicationContextAware and get a WebCrawler-Bean from your ApplicationContext.

public class CustomCrawlerFactory<T extends WebCrawler> implements CrawlController.WebCrawlerFactory<T>, ApplicationContextAware {

    private ApplicationContext context;

    @Override
    public T newInstance() throws Exception {
        return (T) context.getBean(MyCrawler.class);
    }

    @Override
    public void setApplicationContext(ApplicationContext context) {
        this.context = context;
    }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants