# Healthcare

Healthcare as an industry encompasses both goods (i.e., medicines and equipment) and services (consultation or diagnostic testing) for curative, preventive, palliative, and rehabilitative care.

> NOTE: Curative care is provided to cure a patient suffering from a curable disease, and preventative care is meant to prevent one from falling sick. Rehabilitative care helps patients recuperate from illness and includes activities like physical therapy. Palliative care focuses on improving the quality of life for patients suffering from terminal conditions.

NLP in healthcare usecases by Chilmark Research

![alt text](https://learning.oreilly.com/library/view/practical-natural-language/9781492054047/assets/pnlp_1001.png)

# Health and Medical Records

A large proportion of health and medical data is often collected and stored in unstructured text formats. This includes medical notes, prescriptions, and audio transcripts, as well as pathology and radiology reports.

NLP can help doctors search and analyze this data better and even automate some of the workflows, such as by building automated question-answering systems to decrease time to look up relevant patient information.

# Patient Priotization and Billing

NLP techniques can be used on physician notes to understand their state and urgency to prioritize various health procedures and checkups. This can minimize delays and administrative errors and automate processes. Similarly, parsing and extracting information from unstructured notes to identify medical codes can facilitate billing.

# Pharmacovigilance

Pharmacovigilance entails all activities that are needed to ensure that a drug is safe. This involves collection and detection and monitoring of adverse drug or medication reactions. A medical procedure or drug can have unintended or noxious effects, and monitoring and preventing these effects is essential to making sure the drug acts as intended. With increasing use of social media, more of such side effects are being mentioned in social media messages; monitoring and identifying these is part of the solution.

# Clinical Decision Support Systems

Decision support systems assist medical workers in making healthcare-related decisions. These include screening, diagnosis, treatments, and monitoring. Various text data can be used as an input to these systems, including electronic health records, column-tabulated laboratory results, and operative notes.

# Health Assistants

Health assistants and chatbots can improve the patient and caregiver experiences by using various aspects of expert systems and NLP. 

# Eletronic Health Records

Increased adoption of storing clinical and healthcare data electronically has led to an explosion of medical data and overwhelmingly large personal records. With this increasing adoption and larger document size and history, it’s getting harder for doctors and clinical staff to access this data, leading to an information overload. This, in turn, leads to more errors, omissions, and delays and affects patient safety.

## HARVEST: Longitudinal Report Understanding

HARVEST parses all of the medical data to make it easy to analyze and can sit on top of any medical system. Figure below demonstrates how HARVEST is used on the iNYP system, showing the revamped and evolved visual depiction of the formerly text-heavy reporting format.

We can see a timeline of each visit to the clinic or hospital. It’s accompanied by a word cloud of important medical conditions for the patient in the given time range. The user can drill down to detailed notes and history if needed as well. All of this is also supported by summaries of each report so a user can get the gist of a patient’s medical history quickly. HARVEST is much more than a reformatted novelty—it’s extremely useful for giving not just doctors, but also general medical staff and caregivers, a near real-time, informative snapshot of what’s going on with a patient.

![alt text](https://learning.oreilly.com/library/view/practical-natural-language/9781492054047/assets/pnlp_1006.png)

HARVEST delivers understandable summaries and conclusions by collating a patient’s history of healthcare issues across their lifetime. Its unique selling point is that it can mine, extract, and visually present content at a macro level—based on detailed micro-level observations—irrespective of where and by whom in the hospital a patient might have been seen. Such systems can be built to visualize and analyze a large amount of information. When the underlying knowledge base is unstructured text, as it is in the case of EHRs, NLP techniques play a key role in such analytics and information visualization tools.

![alt text](https://learning.oreilly.com/library/view/practical-natural-language/9781492054047/assets/pnlp_1007.png)

# Question Anwering for Health

Our focus here is on the nuances of questions that arise specifically in healthcare scenarios. For example, these questions can include:
- What dosage of a particular medicine is a patient required to take?
- For what ailment is a particular medication taken?
- What were the results of a medical test?
- By how much was the result of a medical test out of range for a given test date?
- What lab test confirmed a particular disease?

To create such datasets of questions and answers and build a QA system on them, a general question-answering dataset creation framework consists of:
1. Collecting domain-specific questions and then normalizing them. For instance, a patient’s treatment can be asked about in multiple ways, like, “How was the problem managed?” or “What was done to correct the patient’s problem?” These all have to be normalized in the same logical form.
2. Question templates are mapped with expert domain knowledge and logical forms are assigned to them. The question template is an abstract question. For example, for a certain type of question, we expect a number or a medication type as a response. More concretely, a question template is “What is the dosage of medication?”, which then maps to an exact question, like, “What is the dosage of Nitroglycerin?” This question is of a logical form that expects a dosage as response.
3. Existing annotations and the information collected in (1) and (2) are used to create a range of question-and-answer pairs. Here, already available information like NE tags as well as answer types linked to the logical form are used to bootstrap data. This step is especially relevant, as it reduces the manual effort needed in the creation of the QA dataset.

QA dataset generation using existing annotations
![alt text](https://learning.oreilly.com/library/view/practical-natural-language/9781492054047/assets/pnlp_1009.png)

# Outcome Prediction and Best Practices

Health outcomes are a set of attributes that explain the consequences of a disease for a patient. They include how fast and how completely a patient recovers. They’re also important in measuring efficacies of different treatments. This [work](https://ai.googleblog.com/2018/05/deep-learning-for-electronic-health.html) is a joint collaboration between Google AI, Stanford Medicine, and UCSF.

Another focus of scalable and accurate deep learning with electronic health records is to ensure that we can build models and systems that can be both scalable as well as highly accurate. Scalability is necessary, as healthcare has a diverse set of inputs—data collected from one hospital or department can be different from another. So it should be simple to train the system for a different outcome or different hospital. It’s necessary to be accurate in order not to raise too many false alarms; the need for accuracy is obvious in the healthcare industry, where people’s lives are on the line.

It’s important in healthcare that models are interpretable. In other words, they should pinpoint why they suggested a particular outcome. Without interpretability, it’s hard for doctors to accommodate the results in their diagnosis. To achieve this, attention, a concept in deep learning, is used to understand what data points and incidents are most important for an outcome.

An example of attention applied to health records.
![alt text](https://learning.oreilly.com/library/view/practical-natural-language/9781492054047/assets/pnlp_1010.png)

This Google AI team also came up with some of the best practices one should keep in mind while building ML models for healthcare, outlining ideas in all parts of the machine learning life cycle, from defining the problem and collecting data to validating the results. 

# Mental Healthcare Monitoring

With social media usage at an all-time high, it’s increasingly possible to use signals from social media to track the emotional state and mental balance of both particular individuals and across groups of individuals. It should also be possible to gain insights into these aspects across various demographic groups, including age and gender.

There are innumerable aspects to evaluating an individual’s mental well-being. The study by Glen Coppersmith et al. focuses, as an illustrative example, on utilizing social media in identifying individuals who are at risk for suicide. The goal of the study was to develop an early warning system along with identifying the root causes of the issues.

In this study, 554 users were identified and evaluated who stated that they attempted to take their lives. 312 of these users gave an explicit indication of their latest suicide attempt. Profiles that were marked as private were not included in this study. They only examined public data, which does not include any direct messages or deleted posts.

Each user’s tweets were analyzed with the following perspectives:
- Is the user’s statement of attempting to take their life apparently genuine?
- Is the user is speaking about their own suicide attempt?
- Is the suicide attempt localizable in time?

The first two tweets refer to genuine suicide attempts, while the bottom two are sarcastic or false statements. The middle two are examples where an explicit suicide attempt date is mentioned.

![alt text](https://learning.oreilly.com/library/view/practical-natural-language/9781492054047/assets/pnlp_1011.png)

In order to analyze the data, the following steps were followed:
1. **Preprocessing:** normalized and cleaned data from social media.
2. **Character models:** Character n-gram–based models followed by logistic regression were used to classify various tweets. Performance was measured with 10-fold cross validation.
3. **Emotional states:** To estimate emotional content in tweets, a dataset was bootstrapped using hashtags. For instance, all tweets containing #anger but not containing #sarcasm and #jk were put into an emotional label. Tweets with no emotional content were also classified as No Emotion.

Confusion matrix for emotion classification

![alt text](https://learning.oreilly.com/library/view/practical-natural-language/9781492054047/assets/pnlp_1012.png)

# Medical Information Extraction and Analysis

Medical information extraction (IE) helps to identify clinical syndromes, medical conditions, medication, dosage, strength, and common biomedical concepts from health records, radiology reports, and discharge summaries, as well as nursing documentation and medical education documents.

We can use Amazon Comprehend Medical as a cloud API on our medical text. Amazon Comprehend Medical helps process medical data, including medical named entity and relationship extraction and medical ontology linking.

In order to build better models for biomedical data, BERT for Biomedical Text (BioBERT) was created. It adapts BERT to biomedical texts to get better performance. In the domain adaptation phase, we initialize the model weights with a standard BERT model and pre-trained biomedical texts, including texts from PubMed, a search engine for medical results.

BioBERT

![alt text](https://learning.oreilly.com/library/view/practical-natural-language/9781492054047/assets/pnlp_1014.png)

In [1]:
# Google Colab for BioBERT
# https://colab.research.google.com/drive/1o3DxuIcQ9LTjwkEk6Db2RGyJqoGC5b55?usp=sharing