Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

extensible Web Retrieval Toolkit (eWRT)

[Travis/Develop Branch]: Build Status

The Extensible Web Retrieval Toolkit (eWRT) is a modular open-source Python API which

  1. offers a unified interface for retrieving social data from Web sources such as Delicious, Flickr, Yahoo! and Wikipedia,
  2. includes various helper classes for effective caching and data management,
  3. provides components for low-level natural language processing functionalities such as language detection, phonetic string similarity measures, and methods for string normalization.


adjust eWRT/src/ to your setting and save it to

  • ~/.eWRT/ (user specific settings) and/or
  • /etc/eWRT/ (system wide settings)


  • eWRT.access - file, Web and database access
    • db - database access
    • file - file access
    • http - access web resources supporting authentication (basic, digest), compression, etc.
    • javascript - control Firefox to extract AJAX pages
  • eWRT.input - input and cleanup modules
    • clean - clean and normalize text phrases
    • conv - convert doc, html and pdf files to text documents; convert XCL to rdf
    • corpus - input readers for the Reuters and BBC corpus
    • csv - read and analyze csv files
    • stock - stock quotes
  • eWRT.ontology - tools for comparing, evaluating and visualizing ontologies
    • compare - compare ontology nodes, relations, and relation types
    • eval - determine the coherence of ontology nodes
    • visualize - visualize ontologies
  • eWRT.stat - the eWRT statistics packages
    • coherence - compute the coherence between terms (Dice, PMI)
    • metrics - evaluation metrics (precision, recall, F1)
    • language - simple language detection
    • string - word (Levenshtein, Damerau-Levenshtein, Soundex, ...) and document (Vector Space Model) similarity metrics
  • eWRT.util - utility classes for transparent caching, logging, monitoring, etc.
    • advLogging - log to SNMP handler
    • assert - assertion based counters (decorators)
    • async - asynchronous procedure calls (experimental)
    • cache - transparent memory and disk caching of function calls (decorators)
    • exception - SNMP exception handling
    • loggerProfile - simplified logging
    • module_path - compute relative paths
    • monitoring - support for Nagios NSCA services
    • pickleIterator - iterate over objects stored in pickle files
    • profile - python profiling
    • timing - time python methods (decorators)
  • eWRT.visualize - eWRT visualization library
  • - Web service access (REST, Amazon, Flickr, Facebook, ...)
    • amazon
    • conceptnet
    • delicious
    • facebook
    • flickr
    • geonames
    • google
    • googlealerts
    • googletrends
    • linkedin
    • opencalais
    • rest - efficiently access/publish REST services
    • rss
    • technorati
    • twitter
    • wikidata
    • wikipedia
    • wordnet
    • wot
    • yahoo
    • youtube