Python
Cannot retrieve the latest commit at this time.
| Failed to load latest commit information. | |||
|
|
LICENSE | ||
|
|
README | ||
|
|
__init__.py | ||
|
|
eightyformat.py | ||
README
In Python, read the .80 file format, for 80legs web crawl results. The URL and data are UTF-8 decoded. From http://80legs.pbworks.com/Results: For people interested in deserializing in other languages, the file format this creates and reads is: <classID><versionID><URL-SIZE><URL><DATA-SIZE><DATA> Note that: * The last 4 items (<URL-SIZE><URL><DATA-SIZE><DATA>) repeat for each url/data pair. * <classID>, <versionID>, <URL-SIZE>, and <DATA-SIZE> are encoded 32-bit integers. * The url is encoded using UTF-8.