Permalink
Commits on Mar 22, 2012
  1. Reformat README as Markdown/reStructuredText

    (Compatible with either markup syntax, using the common
    subset documented at https://gist.github.com/1855764 -
    note however that GitHub Markdown numbers the sections
    incorrectly, so this should be renamed README.rst).
    dupuy committed Mar 22, 2012
Commits on Mar 20, 2012
  1. Added .gitignore

    julen committed Mar 20, 2012
Commits on Jun 23, 2011
  1. Removed the feature that counts donwloaded files

    svn path=/src/trunk/corpuscatcher/; revision=17552
    Laurette Pretorius committed Jun 23, 2011
Commits on Jun 6, 2011
  1. svn path=/src/trunk/corpuscatcher/; revision=17530

    Laurette Pretorius committed Jun 6, 2011
Commits on Jun 3, 2011
  1. Tries to align files in two folders - src and tgt language - using ht…

    …ml structure, numbers and url correspondence
    
    svn path=/src/trunk/corpuscatcher/; revision=17504
    Laurette Pretorius committed Jun 3, 2011
  2. Improved pattern matching for urls by using regex's

    svn path=/src/trunk/corpuscatcher/; revision=17503
    Laurette Pretorius committed Jun 3, 2011
Commits on May 23, 2011
  1. Added an option (-e) to specify a pattern to be matched in the URLs t…

    …o be downloaded.
    
    svn path=/src/trunk/corpuscatcher/; revision=17438
    Laurette Pretorius committed May 23, 2011
Commits on May 16, 2011
  1. Fixed a bug related to selecting encodings for html files

    svn path=/src/trunk/corpuscatcher/; revision=17401
    Laurette Pretorius committed May 16, 2011
Commits on Apr 6, 2011
  1. Assume that immediately consequtive lines are part of the same paragr…

    …aph and join them. Split paragraphs in our outputs by two newlines.
    
    svn path=/src/trunk/corpuscatcher/; revision=17333
    friedelwolff committed Apr 6, 2011
  2. Some cleanup, simplification, reordering

    svn path=/src/trunk/corpuscatcher/; revision=17332
    friedelwolff committed Apr 6, 2011
  3. Better support for non-list output (output as running text)

    svn path=/src/trunk/corpuscatcher/; revision=17322
    friedelwolff committed Apr 6, 2011
Commits on Nov 19, 2008
  1. Suppress unnecessary warning about having the browser handle gzipped …

    …data
    
    svn path=/src/trunk/corpuscatcher/; revision=8967
    Walter Leibbrandt committed Nov 19, 2008
Commits on Nov 18, 2008
  1. Don't convert pages if there's nothing to convert.

    svn path=/src/trunk/corpuscatcher/; revision=8959
    Walter Leibbrandt committed Nov 18, 2008
  2. - Moved browser object initialization to a seperate method (so that i…

    …t's available to importing clients).
    
    - Added a "browser" parameter to download_url().
    
    svn path=/src/trunk/corpuscatcher/; revision=8958
    Walter Leibbrandt committed Nov 18, 2008
Commits on Nov 13, 2008
  1. Fixed a bug where only the last crawled URL (and its connections) are…

    … converted to text.
    
    svn path=/src/trunk/corpuscatcher/; revision=8921
    Walter Leibbrandt committed Nov 13, 2008
  2. Make corpuscatcher an importable module.

    svn path=/src/trunk/corpuscatcher/; revision=8920
    Walter Leibbrandt committed Nov 13, 2008
Commits on Aug 14, 2008
  1. Added support for handling more encodings.

    svn path=/src/trunk/corpuscatcher/; revision=8035
    Walter Leibbrandt committed Aug 14, 2008
Commits on Jul 17, 2008
  1. - Added -V/--version command-line argument

    - Added more specific settings to the mechanize.Browser object used for crawling
    
    svn path=/src/trunk/corpuscatcher/; revision=7785
    Walter Leibbrandt committed Jul 17, 2008
  2. Added -V/--version command-line argument.

    svn path=/src/trunk/corpuscatcher/; revision=7784
    Walter Leibbrandt committed Jul 17, 2008
  3. Documentation updated:

    - Added LICENSE and __version__.py
    - README points the read to the README on the wiki.
    
    svn path=/src/trunk/corpuscatcher/; revision=7783
    Walter Leibbrandt committed Jul 17, 2008
Commits on Jul 16, 2008
  1. Fix copyright date

    svn path=/src/trunk/corpuscatcher/; revision=7769
    friedelwolff committed Jul 16, 2008
Commits on Jul 15, 2008
  1. Correct copyright dates

    svn path=/src/trunk/corpuscatcher/; revision=7756
    friedelwolff committed Jul 15, 2008
  2. Initial version of CorpusCatcher tools.

    svn path=/src/trunk/corpuscatcher/; revision=7755
    Walter Leibbrandt committed Jul 15, 2008