extensible Web Retrieval Toolkit
Clone or download
Permalink
Failed to load latest commit information.
doc fix: multiple math packages. Jul 10, 2017
examples/ontology chg: py3 preparation Nov 29, 2017
src
tests fix: move tests. Oct 31, 2018
uml
.gitignore fix: Handling errors with temporal attributes in unexpected places, s… Sep 18, 2018
CHANGELOG.txt updated changelog Feb 11, 2013
Doxyfile
GUIDELINES.md chg: conceptnet update and code cleanup Nov 15, 2015
LICENSE
MANIFEST.in
README.md
requirements.txt chg: commented out unused linkedin / removed oauth lib (py3 compat Nov 29, 2017
setup.cfg
setup.py
tox.ini chg: added lxml dependency to tox and nosetest command for google cus… Sep 5, 2014

README.md

extensible Web Retrieval Toolkit (eWRT)

The Extensible Web Retrieval Toolkit (eWRT) is a modular open-source Python API which

  1. offers a unified interface for retrieving social data from Web sources such as Delicious, Flickr, Yahoo! and Wikipedia,
  2. includes various helper classes for effective caching and data management,
  3. provides components for low-level natural language processing functionalities such as language detection, phonetic string similarity measures, and methods for string normalization.

Quickstart:

adjust eWRT/src/siteconfig.py-sample to your setting and save it to

  • ~/.eWRT/siteconfig.py (user specific settings) and/or
  • /etc/eWRT/siteconfig.py (system wide settings)

Packages:

  • eWRT.access - file, Web and database access
    • db - database access
    • file - file access
    • http - access web resources supporting authentication (basic, digest), compression, etc.
    • javascript - control Firefox to extract AJAX pages
  • eWRT.input - input and cleanup modules
    • clean - clean and normalize text phrases
    • conv - convert doc, html and pdf files to text documents; convert XCL to rdf
    • corpus - input readers for the Reuters and BBC corpus
    • csv - read and analyze csv files
    • stock - stock quotes
  • eWRT.ontology - tools for comparing, evaluating and visualizing ontologies
    • compare - compare ontology nodes, relations, and relation types
    • eval - determine the coherence of ontology nodes
    • visualize - visualize ontologies
  • eWRT.stat - the eWRT statistics packages
    • coherence - compute the coherence between terms (Dice, PMI)
    • metrics - evaluation metrics (precision, recall, F1)
    • language - simple language detection
    • string - word (Levenshtein, Damerau-Levenshtein, Soundex, ...) and document (Vector Space Model) similarity metrics
  • eWRT.util - utility classes for transparent caching, logging, monitoring, etc.
    • advLogging - log to SNMP handler
    • assert - assertion based counters (decorators)
    • async - asynchronous procedure calls (experimental)
    • cache - transparent memory and disk caching of function calls (decorators)
    • exception - SNMP exception handling
    • loggerProfile - simplified logging
    • module_path - compute relative paths
    • monitoring - support for Nagios NSCA services
    • pickleIterator - iterate over objects stored in pickle files
    • profile - python profiling
    • timing - time python methods (decorators)
  • eWRT.visualize - eWRT visualization library
  • eWRT.ws - Web service access (REST, Amazon, Flickr, Facebook, ...)
    • amazon
    • conceptnet
    • delicious
    • facebook
    • flickr
    • geonames
    • google
    • googlealerts
    • googletrends
    • linkedin
    • opencalais
    • rest - efficiently access/publish REST services
    • rss
    • technorati
    • twitter
    • wikipedia
    • wordnet
    • wot
    • yahoo
    • youtube

Requirements: