Skip to content

Nanite - a friendly swarm of format-identifying robots.

Notifications You must be signed in to change notification settings

willp-bl/nanite

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Nanite - a friendly swarm of format-identifying robots

Build Status

Nanite logo

The Nanite project builds on DROID and Apache Tika to provide a rich format identification and characterization system. It aims to make it easier to run identification and characterisation at scale, and helps compare and combine the results of different tools.

  • nanite-core contains the core identification code, a wrapped version of DROID that can parse InputStreams.
  • nanite-hadoop allows nanite-core identifiers to be run on web archives via Map-Reduce on Apache Hadoop. It depends on the (W)ARC Record Readers from the WAP codebase. It can also use Apache Tika and libmagic for identification. Files can be characterized using Tika and output in a format suitable for importing into C3PO.

Nanite has been used at scale, see this blog post

Acknowledgements

This work was partially supported by the SCAPE project. The SCAPE project is co-funded by the European Union under FP7 ICT-2009.4.1 (Grant Agreement number 270137)

About

Nanite - a friendly swarm of format-identifying robots.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Java 100.0%