# Week 6_ Named Entities Recognition

- Introduction to named entities recognition and its types
- Understanding of Named Entity Recognition (NER) and its applications
- Introduction to different NER techniques such as rule-based, statistical and neural models
- Understanding of Named Entity Disambiguation (NED) and its applications
- Understanding of Named Entity Linking (NEL) and its applications
- Implementing Named Entity Recognition models using PyTorch or TensorFlow
- Understanding of evaluation metrics for Named Entity Recognition
- Introduction to pre-trained models such as BERT and its fine-tuning for NER tasks
- Understanding of transfer learning for Named Entity Recognition
- Understanding of active learning and its application in Named Entity Recognition
- Understanding of unsupervised techniques for Named Entity Recognition
- Understanding the role of Named Entity in NLP tasks such as Text summarization, Text generation and Machine Translation
- Understanding of data preparation and data cleaning for Named Entity Recognition tasks
- Understanding the role of ensemble models in Named Entity Recognition

# Named entities recognition and its types

## What is Named Entity Recognition?

Named Entity Recognition is a part of Natural Language Processing. The primary objective of NER is to process structured and unstructured data and classify these named entities into predefined categories. Some common categories include name, location, company, time, monetary values, events, and more.

In a nutshell, NER deals with:

- Named entity recognition/detection – Identifying a word or series of words in a document.
- Named entity classification – Classifying every detected entity into predefined categories.

<img src="images/NER.png" width="500px" height="500px">

Image source:[Link to source](https://pbs.twimg.com/media/E78RgJrWEAI4Inb?format=png&name=small)


<img src="images/app with discription.png" width="500px" height="500px">

Image source:[Link to source](https://devopedia.org/images/article/256/8660.1580659054.png)



<img src="images/NER types.png" width="500px" height="500px">

Image source:[Link to source](https://lh3.googleusercontent.com/GS2laY6y-wSeQqGmbFlgTZ57Cyid2E5Q0awfy9X5EjIKnmmm0GvgSFMHfYHUR027Rip1karv-9VVxHGiso5AdN4eyhmeOyq_8L8FWYDk7I17BIbInSCwJ9dMBcuqXkbeM-tRX1-z)


Some of the common examples of a predetermined entity categorization are:

- Person: Michael Jackson, Oprah Winfrey, Barack Obama, Susan Sarandon

- Location: Canada, Honolulu, Bangkok, Brazil, Cambridge

- Organization: Samsung, Disney, Yale University, Google

- Time: 15.35, 12 PM,

Other categories include Numerical values, Expression, E-Mail Addresses, and Facility.

Ambiguity in Named Entity Recognition

The category a term belongs to is intuitively quite clear for human beings. However, that’s not the case with computers – they encounter classification problems. For example:

Manchester City (Organization) won the Premier League Trophy whereas in the following sentence the organization is used differently. Manchester City (Location) was a Textile and industrial Powerhouse.

Your NER model needs training data to conduct accurate entity extraction and classification. If you are training your model on Shakespearean English, needless to say, it won’t be able to decipher Instagram

<img src="images/Common-Examples-of-NER.jpg" width="500px" height="500px">

Image source:[Link to source](https://www.shaip.com/wp-content/uploads/2022/02/Blog_Common-Examples-of-NER_500x350.jpg)


<img src="images/few-nerd.png" width="500px" height="500px">

Image source:[Link to source](https://production-media.paperswithcode.com/datasets/few-nerd.png)



# Applications

Let’s discuss some of the interesting use cases of Named Entity Recognition:
- Customer Support 

<img src="images/Customer-Service app 1.png" width="500px" height="500px">

Image source:[Link to source](https://blog.happyfox.com/wp-content/uploads/2020/10/Customer-Service-Vs-Customer-Support-Vs-Customer-Success.png)

- Gain Insights from Customer feedback

<img src="images/app 2 listen.jpg" width="500px" height="500px">

Image source:[Link to source](https://zuyder.files.wordpress.com/2014/06/listen1.jpg)

- Recommendation System

<img src="images/app 3 recomendation.png" width="500px" height="500px">

Image source:[Link to source](https://www.mdpi.com/applsci/applsci-10-05510/article_deploy/html/images/applsci-10-05510-g001.png)

- Summarizing Resume

<img src="images/resume_summary app 4.jpg" width="500px" height="500px">

Image source:[Link to source](https://cdn-images.zety.com/pages/resume_summary_on_a_template_dark.jpg)




# Different NER techniques such as rule-based, statistical and neural models

<img src="images/types of ner.png" width="500px" height="500px">

Image source:[Link to source](https://www.turing.com/kb/how-to-train-custom-ner-model-using-spacy)

The three most commonly used NER systems are the following:
- Supervised machine learning :
Supervised machine learning-based systems use ML models trained on texts humans have pre-labeled with named entity categories. Supervised machine learning approaches use algorithms such as conditional random fields and maximum entropy, two complex statistical language models. This method is effective for parsing semantic meanings and other complexities, though it requires large volumes of training data.

- Rules-based systems :
Rules-based systems use rules to extract information. Rules can include capitalizations or titles, such as "Dr." This method requires a lot of human intervention to input, monitor and tweak the rules, and it might miss textual variations not included in its training annotations. It's thought that rules-based systems don't handle complexity as well as machine learning models.In this approach, information is extracted based on a set of pre-set rules. There are two primary sets of rules used,

      -  Pattern-based rules – As the name suggests, a pattern-based rule follows a  morphological pattern or string of words used in the document.

      - Context-based rules – Context-based rules depend on the meaning or the context of the word in the document.
      
- Dictionary-based systems :
Dictionary-based systems use a dictionary with an extensive vocabulary and synonym collection to cross-check and identify named entities. This method might have trouble classifying named entities with variations in spellings.

- Machine learning-based systems :
In Machine learning-based systems, statistical modeling is used to detect entities. A feature-based representation of the text document is used in this approach. You can overcome several drawbacks of the first two approaches since the model can recognize entity types despite slight variations in their spellings.


There are also several emerging NER methods:

- Unsupervised machine learning systems use ML systems not already pre-trained on annotated text data. Unsupervised learning models are thought to be capable of processing more complex tasks than supervised systems.
- Bootstrapping systems, also known as self-supervised, predictively categorize named entities based on grammatical characteristics, such as capitalization, parts-of-speech tags and other pre-trained categories. A human then fine-tunes the bootstrap system, labeling the system's predictions as correct or incorrect and adding the correct ones to a new training set.
- Neural network systems build an NER model using neural networks, bidirectional architecture learning models, such as Bidirectional Encoder Representations from Transformers, and encoding techniques. This approach minimizes human interaction.
- Statistical systems use probabilistic models trained on textual patterns and relationships to predict named entities in new text data.
- Semantic role labeling systems preprocesses an NER model with semantic learning techniques to teach it the context and relationships between categories.
- Hybrid systems use aspects of multiple systems in a combined approach.


<img src="images/machine_learning_rise.png" width="800px" height="800px">

Image source:[Link to source](https://cdn.ttgtmedia.com/rms/onlineImages/machine_learning_rise.png)


# Named Entity Disambiguation (NED) and Named Entity Linking (NEL) and its applications

You might have noticed over the years how search engines and social media recommender systems have gotten smarter and more accurate. Search engines are now capable of parsing ambiguous Natural Language queries, such as “Where do the Warriors play?” (i.e. Oracle Arena, San Francisco), and social media platforms are able to recommend posts which are related to comments you have written, or pages you have visited.

The accuracy of these predictions looks fairly magical at times, but there's a lot under the hood to make it happen.

One of the most interesting pieces of this puzzle is called Named Entity Disambiguation (NED), or Entity Linking (EL).

Named Entity Disambiguation is the task of mapping words of interest, such as names of persons, locations and companies, from an input text document to corresponding unique entities in a target Knowledge Base (KB). Words of interest are called Named Entities (NEs), mentions, or surface forms. The target KB depends on the application, but for generic Named Entity Disambiguation systems a common choice is Wikipedia. Usually Named Entity Disambiguation does not employ Wikipedia directly, but they exploit databases which contain structured versions of it, such as DBpedia or Wikidata.

For example, if we have a sentence like "The Indiana Pacers and Miami Heat meet at Miami’s American Airlines Arena”, we can link each Named Entity to its Wikipedia page (e.g. en.wikipedia.org/wiki/Indiana_Pacers).

<img src="images/NED.png" width="800px" height="800px">

Image source:[Link to source](https://lh5.googleusercontent.com/fzAtIal2qljvCLpwL2G_6Fp6M6ne05MvJjKrfe-Nlc1WOz_rvJ1dFubXD0atWYH7SJzDVFoqg-sWj-jtbgPXJJEIKXQaFzwVIoaF7uoqJb_iXTefYvygh9sAL0eiWzEIXsTAaXYK)


Named Entity Disambiguation (NED), or Named Entity Linking, is a natural language processing (NLP) task which assigns a unique identity to entities mentioned in text. This can be helpful in text analysis. For example, a financial company may want to identify all companies mentioned within a news article, and subsequently investigate how the relations between the companies might affect the markets.

It is helpful to view NED as a component within an information extraction pipeline given a text document. For example, if we were given a text document consisting of the sentence, “President Ford granted a pardon to President Nixon.”, we would first identify all the words of interests, such as ‘Ford’. This is called Named Entity Recognition (NER). The word ‘Ford’ can refer to President Gerald Ford, Henry Ford, or the car company “Ford”. As such, the second step in a pipeline is candidate selection, where we narrow down the list of possible candidates for the identified words of interest. Subsequently, we disambiguate entities from the candidate list and link each identified entity to a unique identifier within a knowledge base. In the example above, we should ideally identify the entity ‘Ford’ as ‘Gerald Ford’.

<img src="images/NED 1.png" width="700px" height="700px">

Image source:[Link to source](https://miro.medium.com/v2/resize:fit:828/0*IOOZpjlLaJUWjex9)


#  Evaluation Metrics of NER

## CoNLL: Computational Natural Language Learning

“Precision is the percentage of named entities found by the learning model that is correct. Recall is the percentage of named entities present in the corpus that are found by the model. A named entity is correct only if it is an exact match of the corresponding entity in the data file.”

The Language-Independent Named Entity Recognition task introduced at CoNLL-2003 measures the performance of the systems in terms of precision, recall, and f1-score.

## Automatic Content Extraction (ACE)

The ACE challenges use a more complex evaluation metric which includes a weighting schema, Check References for deeper understanding.

Replicating experiments and baselines from ACE are a little complex since all the datasets and results are not open and free, so I guess this challenge results and experiments will fade away with time.

## Message Understanding Conference (MUC)

MUC introduced detailed metrics in an evaluation considering different categories of errors these metrics can be defined as in terms of comparing the response of a model against golden annotation:

- Correct (COR): both are the same;
- Incorrect (INC): the output of a system and the golden annotation don’t match;
- Partial (PAR): system and the golden annotation are somewhat “similar” but not the same;
- Missing (MIS): a golden annotation is not captured by a system;
- Spurius (SPU): model produces a response which doesn’t exist in the golden annotation;

source:[Link to Blog](https://umagunturi789.medium.com/everything-you-need-to-know-about-named-entity-recognition-2a136f38c08f)

#  Pre-trained models such as BERT and its fine-tuning for NER tasks


## Introduction

Named Entity Recognition is a major task in Natural Language Processing (NLP) field. It is used to detect the entities in text for further use in the downstream tasks as some text/words are more informative and essential for a given context than others. It is the reason NER is sometimes referred to as Information retrieval, as extracting relevant keywords from the text and classifying them into required classes.
With the help of Named Entity Recognition, we can extract people, places, organizations, etc. in general and for a specific domain also, such as clinical terms, medications, diseases, and many more from medical records for better diagnosis.


## Bidirectional Encoder Representations from Transformers (BERT)

BERT is a general-purpose language pre-trained model on a large dataset, which can be fine-tuned and used for different tasks such as sentimental analysis, question answering system, named entity recognition, and others. BERT is the state-of-the-art method for transfer learning in NLP.
BERT architecture:

<img src="images/bert.png" width="700px" height="700px">

Image source:[Link to source](https://www.analyticsvidhya.com/blog/2020/07/transfer-learning-for-nlp-fine-tuning-bert-for-text-classification/)


#  Transfer learning for Named Entity Recognition

## Transfer Learning in NLP

Transfer learning is a technique where a deep learning model trained on a large dataset is used to perform similar tasks on another dataset. We call such a deep learning model a pre-trained model. The most renowned examples of pre-trained models are the computer vision deep learning models trained on the ImageNet dataset. So, it is better to use a pre-trained model as a starting point to solve a problem rather than building a model from scratch.
Soon a wide range of transformer-based models started coming up for different NLP tasks. There are multiple advantages of using transformer-based models, but the most important ones are:

## First Benefit

These models do not process an input sequence token by token rather they take the entire sequence as input in one go which is a big improvement over RNN based models because now the model can be accelerated by the GPUs.

## Second Benefit

We don’t need labeled data to pre-train these models. It means that we have to just provide a huge amount of unlabeled text data to train a transformer-based model. We can use this trained model for other NLP tasks like text classification, named entity recognition, text generation, etc. This is how transfer learning works in NLP.
BERT and GPT-2 are the most popular transformer-based models and in this article, we will focus on BERT and learn how we can use a pre-trained BERT model to perform text classification.

# Active learning and its application in Named Entity Recognition

<img src="images/active learning.png" width="700px" height="700px">

Image source:[Link to source](https://i0.wp.com/neptune.ai/wp-content/uploads/2022/10/Active-Learning.png?resize=646%2C514&ssl=1)


<img src="images/activelearning NER.png" width="700px" height="700px">

Image source:[Link to source](https://dl.acm.org/doi/10.1145/3593023)


- Workflow of the active learning approach for named entity recognition

<img src="images/Workflow-of-the-active-learning-approach-for-named-entity-recognition.png" width="700px" height="700px">

Image source:[Link to source](https://www.researchgate.net/publication/354192561/figure/fig1/AS:11431281153245323@1682387958490/Workflow-of-the-active-learning-approach-for-named-entity-recognition.png)


<img src="images/ner active learning.png" width="700px" height="700px">

Image source:[Link to source](https://media.springernature.com/lw685/springer-static/image/art%3A10.1007%2Fs13748-021-00230-w/MediaObjects/13748_2021_230_Fig1_HTML.png)

## Active learning use case in NLP (NER)

A use case for improving a Named Entity Recognition (NER) model using active learning is discussed below. A deep dive into active learning specific to NER is discussed in this paper. They have compared the above-discussed strategies/scoring metrics against a random sample selected for training for every iteration. The data set used for benchmarking is OntoNotes 5.0.

<img src="images/active learning use case.png" width="700px" height="700px">

Image source:[Link to source](https://neptune.ai/blog/active-learning-strategies-tools-use-cases)

As we can see above, clearly, all of the active learning strategies are outperforming the random sampling (RAND) baseline performance by a good margin. 

Another representation showing the performance improvement based on different active learning strategies vs the number of iterations is shown below. The same is compared with training data obtained using random sampling techniques.


<img src="images/active learning use case 2.png" width="700px" height="700px">

Image source:[Link to source](https://neptune.ai/blog/active-learning-strategies-tools-use-cases)

MUST VISIT LINK FOR BETTER UNDERSTANDING


