Create and use de-identified databases for research.
- Anonymises relational databases.
- Extracts and de-identifies text from associated binary files.
- Performs some specific preprocessing tasks; e.g.
- preprocesses some specific databases (e.g. Servelec RiO EMR);
- drafts a "data dictionary" for anonymisation, with special knowledge of some databases (e.g. TPP SystmOne);
- fetches some word lists, e.g. forenames/surnames/eponyms.
- Provides tools to link databases, including via Bayesian personal identity matching, in identifiable or de-identified fashion.
- Provides a natural language processing (NLP) pipeline, including built-in NLP, support for external tools, and client/server support for the Natural Language Processing Request Protocol (NLPRP).
- Web app for
- querying the anonymised database;
- providing a de-identification API;
- managing a consent-to-contact process.
See https://crateanon.readthedocs.io
- Python package: https://pypi.org/project/crate-anon/
- Source code: https://github.com/ucam-department-of-psychiatry/crate
- Copyright (C) 2015, University of Cambridge, Department of Psychiatry. Created by Rudolf Cardinal (rnc1001@cam.ac.uk).
- Licensed under the GNU GPL v3+: see LICENSE file.
- Some third-party libraries have slightly different licences; see the documentation.