In Python, read the .80 file format, for 80legs web crawl results.
Python
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
LICENSE
README
__init__.py
eightyformat.py

README

In Python, read the .80 file format, for 80legs web crawl results.

The URL and data are UTF-8 decoded.

From http://80legs.pbworks.com/Results:
    For people interested in deserializing in other languages, the file format this creates and reads is:
        <classID><versionID><URL-SIZE><URL><DATA-SIZE><DATA>
    Note that:
        * The last 4 items (<URL-SIZE><URL><DATA-SIZE><DATA>) repeat for each url/data pair.
        * <classID>, <versionID>, <URL-SIZE>, and <DATA-SIZE> are encoded 32-bit integers.
        * The url is encoded using UTF-8.