Permalink
Commits on Jul 8, 2013
Commits on Jul 7, 2013
  1. Resolve DNS explicitly

    committed Jul 7, 2013
  2. log crawling speed

    committed Jul 7, 2013
  3. util.HasPort

    committed Jul 7, 2013
  4. camel case

    committed Jul 7, 2013
Commits on Oct 12, 2012
Commits on May 9, 2012
  1. setup.py version and other metadata cleanup: only in setup script

    Now it should install in empty virtualenv.
    
    Added get_git_version script, thanks to Douglas Creager <dcreager@dcreager.net>
    committed May 9, 2012
Commits on May 24, 2011
Commits on Jan 23, 2011
  1. io-worker: new field in result JSON: 'key'

    This field is set to original URL, as read on stdin.
    'url' field contains last redirect URL.
    When no redirects were followed, key == url.
    committed Jan 23, 2011
  2. io-worker: reuse string(line) on parse errors

    very minor optimization
    committed Jan 23, 2011
  3. cosmetic

    committed Jan 23, 2011
Commits on Jan 17, 2011
  1. io-worker: cache each fetched url independently.

    Now 3 redirects will make 3 cache entries. Previous code would make one cache entry for any number of redirects.
    committed Jan 17, 2011
  2. io-worker: use flag package for runtime configuration via command lin…

    …e options.
    
    skip-robots flag now requires a dash in front (io-worker -skip-robots)
    IO concurrency limit is now configured via command line argument -jobs, not environment variable
    Other options available: redirects, keepalive, io-timeout, total-timeout.
    committed Jan 17, 2011
Commits on Jan 12, 2011
  1. cosmetic

    committed Jan 12, 2011
  2. io-worker: sync with new robotstxt: use FromResponseBytes

    Even less excessive string copying now.
    committed Jan 12, 2011
Commits on Jan 11, 2011
  1. cosmetic, gofmt

    committed Jan 11, 2011
  2. io-worker: FetchResult.Body is now []byte (was string)

    One less cast to string() (unless it's robots.txt).
    committed Jan 11, 2011
  3. cosmetic, typo

    committed Jan 11, 2011
  4. New stats column: total_time.

    Earlier versions called this value fetch_time. Now fetch_time is was it is supposed to be:
    time spent in network IO for request/response. total_time includes all overheads, like
    cache, /robots.txt requests, parsing, tests, waiting for locks, etc.
    committed Jan 11, 2011
  5. io-worker: test evil server with slow responses.

    I used it to test timeouts.
    committed Jan 11, 2011
  6. io-worker: make number of redirects to follow configurable

    Not in runtime, just in code. Yet.
    committed Jan 11, 2011
  7. io-worker: make KeepAlive (stale) timeout configurable

    Not in runtime, just in code, yet.
    committed Jan 11, 2011
  8. cosmetic, typo fix

    committed Jan 11, 2011
  9. io-worker: rewrite of fetch library. Able to set socket read/write ti…

    …meouts. Extracted Connect, SendRequest, GetResponse into separate methods. Added FetchWithTimeout.
    
    Still not extracted whole library into separate package, but source-wise it should be ready for that.
    committed Jan 11, 2011