In Python, read the .80 file format, for 80legs web crawl results.
Python
Switch branches/tags
Nothing to show
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
LICENSE
README
__init__.py
eightyformat.py

README

In Python, read the .80 file format, for 80legs web crawl results.

The URL and data are UTF-8 decoded.

From http://80legs.pbworks.com/Results:
    For people interested in deserializing in other languages, the file format this creates and reads is:
        <classID><versionID><URL-SIZE><URL><DATA-SIZE><DATA>
    Note that:
        * The last 4 items (<URL-SIZE><URL><DATA-SIZE><DATA>) repeat for each url/data pair.
        * <classID>, <versionID>, <URL-SIZE>, and <DATA-SIZE> are encoded 32-bit integers.
        * The url is encoded using UTF-8.