Skip to content

TohidN/quaero

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Description

Search Engine is still in development.

Installation:

  1. install requirement packages.
  2. download "punkt" dataset in nltk.
$ python3
>>> import nltk
>>> nltk.download('punkt')

Task List

  • User Management(login, signup, profile page, ...) powered by Django Boilerplate
  • Offline Data Storage Models(Sites, Pages, Links)
  • Simple HTML page scraper(powered by BeautifulSoup)
  • Crawler with depth as an endpoint(Simple page crawler powered by requests)
  • Crawling Tasks
  • Queues for Crawling Tasks(Powered by celery)
  • Template to list active and queues tasks
  • Backlink Counter
  • Templates(Search page, List sites and it's pages)
  • Extract Article content and title(remove extra HTML data such as sidebar, header, ...)
  • Indexed Search(Powered by Solr, haystack)

Tasks For Academic Research

  • Sentiment Analysis
  • Factoid Extraction and comparison