Join GitHub today
Kenya Health Management Information System Final Report
- Project lead: Gregoire Lurton, UW Institute for Health Metrics and Evaluation
- Advisors: Abie Flaxman and Emmanuela Gakidou, UW Institute for Health Metrics
- eScience Liaison: Daniel Halperin, Director of Research - Scalable Analytics, UW eScience Institute
Every year, millions of dollars are spend on collecting data on health services in developing countries. This data then typically sits unused because of data access, reliability, and management issues. During this project, we worked with a set of over 5000 monthly reports collected from 2008 to 2012 by the Kenya Health Ministry. These reports are part of the Kenyan Health Management Information System (HMIS), through which hospitals report on a regular basis on the main pathologies they had to treat and the different activities they carried out. This dataset has been collated manually and collected in a diversity of Excel files, which makes it difficult to process and analyse. As a result this type of routine data is seldom used for policy making or health system management.
Our aim was to make this data easily usable for data analysis. We developed a series of methods to 1) programmatically extract the data from Excel in order to automate access to thousands of spreadsheets while handling the quirks of manually-entered Excel data from a variety of report templates, 2) test the reliability of the data using a variety of new spreadsheet and data features, and 3) import the data into SQLShare in order to provide querying capabilities over the spreadsheet data. to SQL, using Excel files metadata to cluster and classify the data.
- We used a variety of Python libraries that can parse Excel spreadsheets to extract the raw data for different report types. For each report type, we developed manual logic to deal with the ways in which humans deviated from the report templates during data entry.
- We used a variety of features to assess the reliability of individual reports. These included obvious metrics such as "spreadsheets, rows, and columns that are blank or all 0 are likely to be invalid", but also used novel "data forensics" features such as Windows File System or Excel metadata to find anomalies in file modification dates that indicated likely data problems. We detected problems such as: files that were duplicated and renamed, but never modified; reports that last changed before the time period they purport to measure; and reports that were produced/updated at a different time of year than the other reports of that type, indicating either partial updates or perhaps subsequent cleaning. We were also able to use these features in step (1) to map reports with non-standard file names to their report type.
- Once we were able to produce a reliable dataset in an automated way, we uploaded the data to SQLShare to make it readily available, and we analyzed it programmatically in R. The above figure shows one example of the rich data we now have access to. The figure presents a time series showing Measles vaccination in Kenyan districts from 2008 to 2012. These series and others made available by the project are of primary interest for deciders and public health professional in Kenya.
Our project used simple functions in Python, R and SQL. As such, the methods developed are easily replicable to other datasets of the same type. The issues encountered with this dataset are indeed similar to those encountered in different HMIS settings. Non usable HMIS data represents a huge loss for Ministries of Health in developing countries who invest huge amounts of money in setting and managing data collection systems. The absence of usable data on health systems also leads different actors of Global Health (NGOs, International actors...) to invest millions of dollars on specific surveys and data collection campaigns. Providing simple tools to standardize data from HMIS and make it easily usable is thus an important task to increase efficiency of health systems in developing countries. Finally, usability of Excel metadata for clustering and quality assessment of the data is an interesting result on which data managers and analysts can build in a variety of situations.
The final data set resulting of the project will be useful for a variety Global Health research topics, from epidemiology to health services evaluation. Ongoing research includes:
- comparing the diagnostic and treatment of Malaria in different zones in the data Kenya to local [pfpr estimations] (http://www.map.ox.ac.uk/browse-resources/endemicity/Pf_mean/world/)
- exploring the different rates of malaria diagnostics biologic confirmation
- getting insights on vaccination campaigns logistics and realizations in different zones of Kenya
- compare data from Kenya to comparable data from other African countries' HMIS
- evaluate and understand performance of Kenyan health services