In [27]:
import numpy as np
import pandas as pd

In [28]:
data = """Jayaram Jayalalithaa[a] (24 February 1948 – 5 December 2016) was an Indian politician and film actress who served six times as the Chief Minister of Tamil Nadu for over fourteen years between 1991 and 2016. From 9 February 1989, she was the general secretary of the All India Anna Dravida Munnetra Kazhagam (AIADMK), a Dravidian party whose cadre revered her as their "Amma" (mother) and Puratchi Thalaivi (revolutionary leader). Her critics in the media and the opposition accused her of fostering a personality cult and of demanding absolute loyalty from AIADMK legislators and ministers, who often publicly prostrated themselves before her.[3]

Jayalalithaa first came into prominence as a leading film actress in the mid-1960s. Though she had entered the profession reluctantly, upon the urging of her mother to support the family, Jayalalithaa worked prolifically. She appeared in 140 films between 1961 and 1980, primarily in the Tamil, Telugu and Kannada languages. Jayalalithaa received praise for her versatility as an actress and for her dancing skills, earning the sobriquet "Queen of Tamil Cinema".[4] Among her frequent co-stars was M. G. Ramachandran, a Tamil cultural icon who leveraged his immense popularity with the masses into a successful political career. In 1982, when MGR was chief minister, Jayalalithaa joined the AIADMK, the party he founded. Her political rise was rapid; within a few years she became AIADMK propaganda secretary and was elected to the Rajya Sabha, the upper house of India's Parliament. After MGR's death in 1987, Jayalalithaa proclaimed herself his political heir and, having fought off the faction headed by Janaki Ramachandran, MGR's widow, emerged as the sole leader of the AIADMK. Following the 1989 election, she became Leader of the Opposition to the DMK-led government headed by Karunanidhi, her bête noire.

In 1991 Jayalalithaa became chief minister, Tamil Nadu's youngest, for the first time. She earned a reputation for centralising state power among a coterie of bureaucrats; her council of ministers, whom she often shuffled around, were largely ceremonial in nature. The successful cradle-baby scheme, which enabled mothers to anonymously offer their newborns for adoption, emerged during this time. Despite an official salary of only a rupee a month, Jayalalithaa indulged in public displays of wealth, culminating in a lavish wedding for her foster son in 1995. In the 1996 election, the AIADMK was nearly wiped out at the hustings; Jayalalithaa herself lost her seat. The new Karunanidhi government filed several corruption cases against her, and she had to spend time in jail. Her fortunes revived in the 1998 general election, as the AIADMK became a key component of Prime Minister Atal Bihari Vajpayee's 1998–99 government; her withdrawal of support toppled it and triggered another general election just a year later.

The AIADMK returned to power in 2001, although Jayalalithaa was personally disbarred from contesting due to the corruption cases. Within a few months of her taking oath as chief minister, in September 2001, she was disqualified from holding office and forced to cede the chair to loyalist O. Panneerselvam. Upon her acquittal six months later, Jayalalithaa returned as chief minister to complete her term. Noted for its ruthlessness to political opponents, many of whom were arrested in midnight raids, her government grew unpopular. Another period (2006–11) in the opposition followed, before Jayalalithaa was sworn in as chief minister for the fourth time after the AIADMK swept the 2011 assembly election. Her government received attention for its extensive social-welfare agenda, which included several subsidised "Amma"-branded goods such as canteens, bottled water, salt and cement. Three years into her tenure, she was convicted in a disproportionate-assets case, rendering her disqualified to hold office. She returned as chief minister after being acquitted in May 2015. In the 2016 assembly election, she became the first Tamil Nadu chief minister since MGR in 1984 to be voted back into office. That September, she fell severely ill and, following 75 days of hospitalisation, died on 5 December 2016 due to cardiac arrest."""

In [29]:
import nltk
import re
from nltk.stem import WordNetLemmatizer
from nltk.corpus import stopwords
lemmatizer = WordNetLemmatizer()
word = nltk.word_tokenize(data)    

In [30]:
word

['Jayaram',
 'Jayalalithaa',
 '[',
 'a',
 ']',
 '(',
 '24',
 'February',
 '1948',
 '–',
 '5',
 'December',
 '2016',
 ')',
 'was',
 'an',
 'Indian',
 'politician',
 'and',
 'film',
 'actress',
 'who',
 'served',
 'six',
 'times',
 'as',
 'the',
 'Chief',
 'Minister',
 'of',
 'Tamil',
 'Nadu',
 'for',
 'over',
 'fourteen',
 'years',
 'between',
 '1991',
 'and',
 '2016',
 '.',
 'From',
 '9',
 'February',
 '1989',
 ',',
 'she',
 'was',
 'the',
 'general',
 'secretary',
 'of',
 'the',
 'All',
 'India',
 'Anna',
 'Dravida',
 'Munnetra',
 'Kazhagam',
 '(',
 'AIADMK',
 ')',
 ',',
 'a',
 'Dravidian',
 'party',
 'whose',
 'cadre',
 'revered',
 'her',
 'as',
 'their',
 '``',
 'Amma',
 "''",
 '(',
 'mother',
 ')',
 'and',
 'Puratchi',
 'Thalaivi',
 '(',
 'revolutionary',
 'leader',
 ')',
 '.',
 'Her',
 'critics',
 'in',
 'the',
 'media',
 'and',
 'the',
 'opposition',
 'accused',
 'her',
 'of',
 'fostering',
 'a',
 'personality',
 'cult',
 'and',
 'of',
 'demanding',
 'absolute',
 'loyalty',
 'fro

In [31]:
pos = nltk.pos_tag(word)
pos

[('Jayaram', 'NNP'),
 ('Jayalalithaa', 'NNP'),
 ('[', 'VBZ'),
 ('a', 'DT'),
 (']', 'NN'),
 ('(', '('),
 ('24', 'CD'),
 ('February', 'NNP'),
 ('1948', 'CD'),
 ('–', 'NNP'),
 ('5', 'CD'),
 ('December', 'NNP'),
 ('2016', 'CD'),
 (')', ')'),
 ('was', 'VBD'),
 ('an', 'DT'),
 ('Indian', 'JJ'),
 ('politician', 'NN'),
 ('and', 'CC'),
 ('film', 'NN'),
 ('actress', 'NN'),
 ('who', 'WP'),
 ('served', 'VBD'),
 ('six', 'CD'),
 ('times', 'NNS'),
 ('as', 'IN'),
 ('the', 'DT'),
 ('Chief', 'NNP'),
 ('Minister', 'NNP'),
 ('of', 'IN'),
 ('Tamil', 'NNP'),
 ('Nadu', 'NNP'),
 ('for', 'IN'),
 ('over', 'IN'),
 ('fourteen', 'JJ'),
 ('years', 'NNS'),
 ('between', 'IN'),
 ('1991', 'CD'),
 ('and', 'CC'),
 ('2016', 'CD'),
 ('.', '.'),
 ('From', 'IN'),
 ('9', 'CD'),
 ('February', 'NNP'),
 ('1989', 'CD'),
 (',', ','),
 ('she', 'PRP'),
 ('was', 'VBD'),
 ('the', 'DT'),
 ('general', 'JJ'),
 ('secretary', 'NN'),
 ('of', 'IN'),
 ('the', 'DT'),
 ('All', 'NNP'),
 ('India', 'NNP'),
 ('Anna', 'NNP'),
 ('Dravida', 'NNP'),
 ('

In [32]:
nltk.help.upenn_tagset('CD')

CD: numeral, cardinal
    mid-1890 nine-thirty forty-two one-tenth ten million 0.5 one forty-
    seven 1987 twenty '79 zero two 78-degrees eighty-four IX '60s .025
    fifteen 271,124 dozen quintillion DM2,000 ...


In [33]:
name_labels = nltk.ne_chunk(pos, binary = True)
name_labels
entity = []
labels = []
for n in name_labels:
    if hasattr(n,'label'):
        entity.append(' '.join([i[0] for i in n]))
        labels.append(n.label())

entity_df = list(set(zip(entity, labels)))
df = pd.DataFrame(entity_df)
df

Unnamed: 0,0,1
0,Indian,NE
1,Tamil,NE
2,Kannada,NE
3,Atal Bihari Vajpayee,NE
4,AIADMK,NE
5,Jayalalithaa,NE
6,MGR,NE
7,Jayaram Jayalalithaa,NE
8,Dravidian,NE
9,Puratchi Thalaivi,NE


In [34]:
specific_name = nltk.ne_chunk(pos, binary = False)
recognized = []
category = []
for name in specific_name:
    if hasattr(name, 'label'):
        recognized.append(' '.join([w[0] for w in name]))
        category.append(name.label())
named_entity  = list(set(zip(recognized, category)))
        

In [35]:
dataframe_entity = pd.DataFrame(named_entity)

In [36]:
dataframe_entity.columns = ['Indian_BJP_Entities','Labels']

In [37]:
dataframe_entity

Unnamed: 0,Indian_BJP_Entities,Labels
0,Telugu,GPE
1,Jayalalithaa,ORGANIZATION
2,Opposition,ORGANIZATION
3,Indian,GPE
4,MGR,ORGANIZATION
5,Parliament,ORGANIZATION
6,Jayaram,PERSON
7,Leader,ORGANIZATION
8,India,GPE
9,Tamil,GPE


In [38]:
dataframe_entity.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 23 entries, 0 to 22
Data columns (total 2 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   Indian_BJP_Entities  23 non-null     object
 1   Labels               23 non-null     object
dtypes: object(2)
memory usage: 496.0+ bytes


In [39]:
import spacy
from spacy import displacy


In [40]:
nlp = spacy.load("en_core_web_sm")

In [41]:
word = nlp(data)

In [42]:
entit = []
lab =[]

for w in word.ents:
    print(w.text,w.label_)
    entit.append(w.text)
    lab.append(w.label_)
df_ent = list(set(zip(entit,lab)))

Jayaram ORG
24 February 1948 DATE
Indian NORP
six CARDINAL
Tamil Nadu ORG
fourteen years DATE
between 1991 and 2016 DATE
9 February 1989 DATE
AIADMK ORG
Dravidian NORP
Puratchi Thalaivi PERSON
AIADMK ORG
first ORDINAL
the mid-1960s DATE
Jayalalithaa PERSON
140 CARDINAL
between 1961 and 1980 DATE
Kannada PERSON
Jayalalithaa PERSON
Queen of Tamil Cinema".[4] WORK_OF_ART
M. G. Ramachandran PERSON
Tamil GPE
1982 DATE
MGR ORG
Jayalalithaa PERSON
AIADMK ORG
a few years DATE
AIADMK ORG
Rajya Sabha PERSON
India GPE
Parliament ORG
MGR ORG
1987 DATE
Jayalalithaa ORG
Janaki Ramachandran PERSON
MGR ORG
AIADMK ORG
1989 DATE
Leader of the Opposition to the DMK ORG
Karunanidhi PERSON
1991 DATE
Tamil Nadu's PERSON
first ORDINAL
Jayalalithaa ORG
1995 DATE
1996 DATE
AIADMK ORG
Jayalalithaa PERSON
Karunanidhi PERSON
1998 DATE
AIADMK ORG
Bihari Vajpayee PERSON
1998–99 CARDINAL
a year later DATE
AIADMK ORG
2001 DATE
Jayalalithaa PERSON
a few months DATE
September 2001 DATE
O. Panneerselvam PERSON
six month

In [43]:
df_ent = pd.DataFrame(df_ent)
df_ent.columns = ['entity','generated_labels']
df_ent

Unnamed: 0,entity,generated_labels
0,"Queen of Tamil Cinema"".[4]",WORK_OF_ART
1,Dravidian,NORP
2,1982,DATE
3,Bihari Vajpayee,PERSON
4,Jayalalithaa,ORG
5,September,DATE
6,MGR,ORG
7,1998,DATE
8,Parliament,ORG
9,Tamil Nadu's,PERSON


In [46]:
spacy.explain('CARDINAL')

'Numerals that do not fall under another type'

In [50]:
displacy.render(word,style = 'ent', jupyter = True)