# Named Entity Recognition (NER)

Named Entity Recognition (NER) is a subtask of information extraction that seeks to locate and classify named entities mentioned in unstructured text into pre-defined categories. NER is a crucial component in natural language processing (NLP) that helps identify and extract meaningful information from text.

## What is Named Entity Recognition?

NER automatically identifies and categorizes entities such as:
- **Person names** (John, Mary, Dr. Smith)
- **Organizations** (Google, NASA, United Nations)
- **Locations** (New York, India, Mount Everest)
- **Dates and Times** (January 2023, 3:00 PM)
- **Monetary values** ($100, â‚¬50)
- **Percentages** (25%, 0.5%)

## Common NER Tags

### **Person & Organization**
- **PERSON** - People, including fictional characters
- **ORG** - Companies, agencies, institutions, organizations
- **NORP** - Nationalities, religious/political groups

### **Location & Geopolitical**
- **GPE** - Countries, cities, states (Geopolitical entities)
- **LOC** - Non-GPE locations, mountain ranges, bodies of water
- **FAC** - Buildings, airports, highways, bridges

### **Time & Numerical**
- **DATE** - Absolute or relative dates or periods
- **TIME** - Times smaller than a day
- **PERCENT** - Percentage values
- **MONEY** - Monetary values, including unit
- **QUANTITY** - Measurements, weights, distances
- **ORDINAL** - First, second, third, etc.
- **CARDINAL** - Numerals that don't fall under other types

### **Other Important Tags**
- **EVENT** - Named hurricanes, battles, wars, sports events
- **WORK_OF_ART** - Titles of books, songs, movies
- **LAW** - Named documents made into laws
- **LANGUAGE** - Any named language
- **PRODUCT** - Objects, vehicles, foods (not services)

In [1]:
sentence="The Eiffel Tower was built from 1887 to 1889 by Gustave Eiffel, whose company specialized in building metal frameworks and structures."

In [2]:
from nltk.tokenize import word_tokenize

tokens = word_tokenize(sentence)

In [3]:
from nltk import pos_tag

tagged_tokens = pos_tag(tokens)

In [4]:
import nltk
nltk.download('maxent_ne_chunker')
nltk.download('maxent_ne_chunker_tab')
nltk.download('words')

[nltk_data] Downloading package maxent_ne_chunker to
[nltk_data]     /Users/dhruvsmac/nltk_data...
[nltk_data]   Package maxent_ne_chunker is already up-to-date!
[nltk_data] Downloading package maxent_ne_chunker_tab to
[nltk_data]     /Users/dhruvsmac/nltk_data...
[nltk_data]   Package maxent_ne_chunker_tab is already up-to-date!
[nltk_data] Downloading package words to /Users/dhruvsmac/nltk_data...
[nltk_data]   Package words is already up-to-date!


True

In [5]:
from nltk import ne_chunk

ne_chunk(tagged_tokens).draw()