Skip to content

openaire/iis

Repository files navigation

About

Information Inference Service (IIS) a flexible data processing system for handling big data based on Apache Hadoop technologies. It is a subsystem of the OpenAIRE system (www.openaire.eu is its public web front-end) - see Fig.1 for a high-level overview.

Fig.1: The center of OpenAIRE system is the Information Space system which stores all information available in the system. IIS ingests data from Information Space, runs processing workflows, and produces inferred data which, in turn, is ingested by Information Space.

The goal of OpenAIRE is to provide an infrastructure for gathering, processing (including de-duplication), and providing unified access to research-related data (papers, datasets, researchers, projects, etc.). The goal of IIS is to provide data/text mining functionality for the OpenAIRE system. In practice, IIS defines data processing workflows that connect various modules, each one with well-defined input and output. A high-level overview of IIS can be found in paper "Information Inference in Scholarly Communication Infrastructures: The OpenAIREplus Project Experience", Procedia Computer Science, vol. 38, 2014, 92-99.

IIS was initially developed during OpenAIREplus project and has been further extended during OpenAIRE2020 project.

The original code was migrated to GitHub from D-NET SVN repository. The public read-only interface of the repository is available at https://svn-public.driver.research-infrastructures.eu/driver/dnet40/modules/ and this is where you can find the history of the code base before the migration (IIS-related Maven projects are the ones matching glob pattern *-iis-*).

Content of the most important subdirectories and files

  • docs - basic documentation
  • iis-core - generic common utilities used by other projects
  • iis-common - OpenAIRE-related common utilities
  • iis-wf - definitions of workflows used in the system
  • CONTRIBUTORS.markdown - list of contributors to the project

License

The code is licensed under Apache License, version 2.0. We also use 3rd party code from other projects compatible with this license. This 3rd party code can be found in directories with names starting with iis-3rdparty-; each directory corresponds to a different source project.