# Week 6_ Named Entities Recognition

- Introduction to named entities recognition and its types
- Understanding of Named Entity Recognition (NER) and its applications
- Introduction to different NER techniques such as rule-based, statistical and neural models
- Understanding of Named Entity Disambiguation (NED) and its applications
- Understanding of Named Entity Linking (NEL) and its applications
- Implementing Named Entity Recognition models using PyTorch or TensorFlow
- Understanding of evaluation metrics for Named Entity Recognition
- Introduction to pre-trained models such as BERT and its fine-tuning for NER tasks
- Understanding of transfer learning for Named Entity Recognition
- Understanding of active learning and its application in Named Entity Recognition
- Understanding of unsupervised techniques for Named Entity Recognition
- Understanding the role of Named Entity in NLP tasks such as Text summarization, Text generation and Machine Translation
- Understanding of data preparation and data cleaning for Named Entity Recognition tasks
- Understanding the role of ensemble models in Named Entity Recognition

# Named entities recognition and its types

## What is Named Entity Recognition?

Named Entity Recognition is a part of Natural Language Processing. The primary objective of NER is to process structured and unstructured data and classify these named entities into predefined categories. Some common categories include name, location, company, time, monetary values, events, and more.

In a nutshell, NER deals with:

- Named entity recognition/detection – Identifying a word or series of words in a document.
- Named entity classification – Classifying every detected entity into predefined categories.

<img src="images/NER.png" width="500px" height="500px">

Image source:[Link to source](https://pbs.twimg.com/media/E78RgJrWEAI4Inb?format=png&name=small)


<img src="images/app with discription.png" width="500px" height="500px">

Image source:[Link to source](https://devopedia.org/images/article/256/8660.1580659054.png)



<img src="images/NER types.png" width="500px" height="500px">

Image source:[Link to source](https://lh3.googleusercontent.com/GS2laY6y-wSeQqGmbFlgTZ57Cyid2E5Q0awfy9X5EjIKnmmm0GvgSFMHfYHUR027Rip1karv-9VVxHGiso5AdN4eyhmeOyq_8L8FWYDk7I17BIbInSCwJ9dMBcuqXkbeM-tRX1-z)


Some of the common examples of a predetermined entity categorization are:

- Person: Michael Jackson, Oprah Winfrey, Barack Obama, Susan Sarandon

- Location: Canada, Honolulu, Bangkok, Brazil, Cambridge

- Organization: Samsung, Disney, Yale University, Google

- Time: 15.35, 12 PM,

Other categories include Numerical values, Expression, E-Mail Addresses, and Facility.

Ambiguity in Named Entity Recognition

The category a term belongs to is intuitively quite clear for human beings. However, that’s not the case with computers – they encounter classification problems. For example:

Manchester City (Organization) won the Premier League Trophy whereas in the following sentence the organization is used differently. Manchester City (Location) was a Textile and industrial Powerhouse.

Your NER model needs training data to conduct accurate entity extraction and classification. If you are training your model on Shakespearean English, needless to say, it won’t be able to decipher Instagram

<img src="images/Common-Examples-of-NER.jpg" width="500px" height="500px">

Image source:[Link to source](https://www.shaip.com/wp-content/uploads/2022/02/Blog_Common-Examples-of-NER_500x350.jpg)


<img src="images/few-nerd.png" width="500px" height="500px">

Image source:[Link to source](https://production-media.paperswithcode.com/datasets/few-nerd.png)



# Applications

Let’s discuss some of the interesting use cases of Named Entity Recognition:
- Customer Support 

<img src="images/Customer-Service app 1.png" width="500px" height="500px">

Image source:[Link to source](https://blog.happyfox.com/wp-content/uploads/2020/10/Customer-Service-Vs-Customer-Support-Vs-Customer-Success.png)

- Gain Insights from Customer feedback

<img src="images/app 2 listen.jpg" width="500px" height="500px">

Image source:[Link to source](https://zuyder.files.wordpress.com/2014/06/listen1.jpg)

- Recommendation System

<img src="images/app 3 recomendation.png" width="500px" height="500px">

Image source:[Link to source](https://www.mdpi.com/applsci/applsci-10-05510/article_deploy/html/images/applsci-10-05510-g001.png)

- Summarizing Resume

<img src="images/resume_summary app 4.jpg" width="500px" height="500px">

Image source:[Link to source](https://cdn-images.zety.com/pages/resume_summary_on_a_template_dark.jpg)




# Different NER techniques such as rule-based, statistical and neural models

<img src="images/types of ner.png" width="500px" height="500px">

Image source:[Link to source](https://www.turing.com/kb/how-to-train-custom-ner-model-using-spacy)

The three most commonly used NER systems are the following:
- Supervised machine learning :
Supervised machine learning-based systems use ML models trained on texts humans have pre-labeled with named entity categories. Supervised machine learning approaches use algorithms such as conditional random fields and maximum entropy, two complex statistical language models. This method is effective for parsing semantic meanings and other complexities, though it requires large volumes of training data.

- Rules-based systems :
Rules-based systems use rules to extract information. Rules can include capitalizations or titles, such as "Dr." This method requires a lot of human intervention to input, monitor and tweak the rules, and it might miss textual variations not included in its training annotations. It's thought that rules-based systems don't handle complexity as well as machine learning models.In this approach, information is extracted based on a set of pre-set rules. There are two primary sets of rules used,

      -  Pattern-based rules – As the name suggests, a pattern-based rule follows a  morphological pattern or string of words used in the document.

      - Context-based rules – Context-based rules depend on the meaning or the context of the word in the document.
      
- Dictionary-based systems :
Dictionary-based systems use a dictionary with an extensive vocabulary and synonym collection to cross-check and identify named entities. This method might have trouble classifying named entities with variations in spellings.

- Machine learning-based systems :
In Machine learning-based systems, statistical modeling is used to detect entities. A feature-based representation of the text document is used in this approach. You can overcome several drawbacks of the first two approaches since the model can recognize entity types despite slight variations in their spellings.


There are also several emerging NER methods:

- Unsupervised machine learning systems use ML systems not already pre-trained on annotated text data. Unsupervised learning models are thought to be capable of processing more complex tasks than supervised systems.
- Bootstrapping systems, also known as self-supervised, predictively categorize named entities based on grammatical characteristics, such as capitalization, parts-of-speech tags and other pre-trained categories. A human then fine-tunes the bootstrap system, labeling the system's predictions as correct or incorrect and adding the correct ones to a new training set.
- Neural network systems build an NER model using neural networks, bidirectional architecture learning models, such as Bidirectional Encoder Representations from Transformers, and encoding techniques. This approach minimizes human interaction.
- Statistical systems use probabilistic models trained on textual patterns and relationships to predict named entities in new text data.
- Semantic role labeling systems preprocesses an NER model with semantic learning techniques to teach it the context and relationships between categories.
- Hybrid systems use aspects of multiple systems in a combined approach.


<img src="images/machine_learning_rise.png" width="800px" height="800px">

Image source:[Link to source](https://cdn.ttgtmedia.com/rms/onlineImages/machine_learning_rise.png)
