Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
branch: master
Commits on Mar 22, 2012
  1. Alexander Dupuy
  2. Alexander Dupuy
  3. Alexander Dupuy

    Reformat README as Markdown/reStructuredText

    dupuy authored
    (Compatible with either markup syntax, using the common
    subset documented at https://gist.github.com/1855764 -
    note however that GitHub Markdown numbers the sections
    incorrectly, so this should be renamed README.rst).
Commits on Mar 20, 2012
  1. Julen Ruiz Aizpuru

    Added .gitignore

    julen authored
Commits on Jun 23, 2011
  1. Removed the feature that counts donwloaded files

    Laurette Pretorius authored
    svn path=/src/trunk/corpuscatcher/; revision=17552
Commits on Jun 6, 2011
  1. svn path=/src/trunk/corpuscatcher/; revision=17530

    Laurette Pretorius authored
Commits on Jun 3, 2011
  1. Tries to align files in two folders - src and tgt language - using ht…

    Laurette Pretorius authored
    …ml structure, numbers and url correspondence
    
    svn path=/src/trunk/corpuscatcher/; revision=17504
  2. Improved pattern matching for urls by using regex's

    Laurette Pretorius authored
    svn path=/src/trunk/corpuscatcher/; revision=17503
Commits on May 23, 2011
  1. Added an option (-e) to specify a pattern to be matched in the URLs t…

    Laurette Pretorius authored
    …o be downloaded.
    
    svn path=/src/trunk/corpuscatcher/; revision=17438
Commits on May 16, 2011
  1. Fixed a bug related to selecting encodings for html files

    Laurette Pretorius authored
    svn path=/src/trunk/corpuscatcher/; revision=17401
Commits on Apr 6, 2011
  1. friedelwolff

    Assume that immediately consequtive lines are part of the same paragr…

    friedelwolff authored
    …aph and join them. Split paragraphs in our outputs by two newlines.
    
    svn path=/src/trunk/corpuscatcher/; revision=17333
  2. friedelwolff

    Some cleanup, simplification, reordering

    friedelwolff authored
    svn path=/src/trunk/corpuscatcher/; revision=17332
  3. friedelwolff

    Better support for non-list output (output as running text)

    friedelwolff authored
    svn path=/src/trunk/corpuscatcher/; revision=17322
Commits on Nov 19, 2008
  1. Suppress unnecessary warning about having the browser handle gzipped …

    Walter Leibbrandt authored
    …data
    
    svn path=/src/trunk/corpuscatcher/; revision=8967
Commits on Nov 18, 2008
  1. Don't convert pages if there's nothing to convert.

    Walter Leibbrandt authored
    svn path=/src/trunk/corpuscatcher/; revision=8959
  2. - Moved browser object initialization to a seperate method (so that i…

    Walter Leibbrandt authored
    …t's available to importing clients).
    
    - Added a "browser" parameter to download_url().
    
    svn path=/src/trunk/corpuscatcher/; revision=8958
Commits on Nov 13, 2008
  1. Fixed a bug where only the last crawled URL (and its connections) are…

    Walter Leibbrandt authored
    … converted to text.
    
    svn path=/src/trunk/corpuscatcher/; revision=8921
  2. Make corpuscatcher an importable module.

    Walter Leibbrandt authored
    svn path=/src/trunk/corpuscatcher/; revision=8920
Commits on Aug 14, 2008
  1. Added support for handling more encodings.

    Walter Leibbrandt authored
    svn path=/src/trunk/corpuscatcher/; revision=8035
Commits on Jul 17, 2008
  1. - Added -V/--version command-line argument

    Walter Leibbrandt authored
    - Added more specific settings to the mechanize.Browser object used for crawling
    
    svn path=/src/trunk/corpuscatcher/; revision=7785
  2. Added -V/--version command-line argument.

    Walter Leibbrandt authored
    svn path=/src/trunk/corpuscatcher/; revision=7784
  3. Documentation updated:

    Walter Leibbrandt authored
    - Added LICENSE and __version__.py
    - README points the read to the README on the wiki.
    
    svn path=/src/trunk/corpuscatcher/; revision=7783
Commits on Jul 16, 2008
  1. friedelwolff

    Fix copyright date

    friedelwolff authored
    svn path=/src/trunk/corpuscatcher/; revision=7769
Commits on Jul 15, 2008
  1. friedelwolff

    Correct copyright dates

    friedelwolff authored
    svn path=/src/trunk/corpuscatcher/; revision=7756
  2. Initial version of CorpusCatcher tools.

    Walter Leibbrandt authored
    svn path=/src/trunk/corpuscatcher/; revision=7755
Something went wrong with that request. Please try again.