# Parser & NER

In [1]:
import spacy

In [2]:
nlp=spacy.load('en_core_web_sm')

In [3]:
type(nlp)

spacy.lang.en.English

In [4]:
doc1=nlp('Let’s start by differentiating between data analytics and traditional analytics. The terms are often used interchangeably, but a distinction does exist. Traditional data analytics refers to the process of analyzing massive amounts of collected data to get insights and predictions. Business data analytics (sometimes called business analytics) takes that idea, but puts it in the context of business insight, often with prebuilt business content and tools that expedite the analysis process. Specifically, business analytics refers to: Taking in and processing historical business data. Analyzing that data to identify trends, patterns, and root causes. Making data-driven business decisions based on those insights. In other words, data analytics is more of a general description of the modern analytics process. Business analytics implies a narrower focus and has functionally become more prevalent and more important for organizations around the globe as the overall volume of data has increased.')

In [5]:
doc1

Let’s start by differentiating between data analytics and traditional analytics. The terms are often used interchangeably, but a distinction does exist. Traditional data analytics refers to the process of analyzing massive amounts of collected data to get insights and predictions. Business data analytics (sometimes called business analytics) takes that idea, but puts it in the context of business insight, often with prebuilt business content and tools that expedite the analysis process. Specifically, business analytics refers to: Taking in and processing historical business data. Analyzing that data to identify trends, patterns, and root causes. Making data-driven business decisions based on those insights. In other words, data analytics is more of a general description of the modern analytics process. Business analytics implies a narrower focus and has functionally become more prevalent and more important for organizations around the globe as the overall volume of data has increased.

In [6]:
len(doc1)

167

# Tokenizer

In [7]:
for token in doc1:
    print(token)

Let
’s
start
by
differentiating
between
data
analytics
and
traditional
analytics
.
The
terms
are
often
used
interchangeably
,
but
a
distinction
does
exist
.
Traditional
data
analytics
refers
to
the
process
of
analyzing
massive
amounts
of
collected
data
to
get
insights
and
predictions
.
Business
data
analytics
(
sometimes
called
business
analytics
)
takes
that
idea
,
but
puts
it
in
the
context
of
business
insight
,
often
with
prebuilt
business
content
and
tools
that
expedite
the
analysis
process
.
Specifically
,
business
analytics
refers
to
:
Taking
in
and
processing
historical
business
data
.
Analyzing
that
data
to
identify
trends
,
patterns
,
and
root
causes
.
Making
data
-
driven
business
decisions
based
on
those
insights
.
In
other
words
,
data
analytics
is
more
of
a
general
description
of
the
modern
analytics
process
.
Business
analytics
implies
a
narrower
focus
and
has
functionally
become
more
prevalent
and
more
important
for
organizations
around
the
globe
as
the
overall
volume
of

## Tagger
Provides parts of Speech tags. POS tags describe the grammatical role of a word in a sentence, such as whether it's a noun, verb, adjective, etc.

In [8]:
for token in doc1:
    print(token.text,'==>',token.tag_)

Let ==> VB
’s ==> PRP
start ==> VB
by ==> IN
differentiating ==> VBG
between ==> IN
data ==> NNS
analytics ==> NNS
and ==> CC
traditional ==> JJ
analytics ==> NNS
. ==> .
The ==> DT
terms ==> NNS
are ==> VBP
often ==> RB
used ==> VBN
interchangeably ==> RB
, ==> ,
but ==> CC
a ==> DT
distinction ==> NN
does ==> VBZ
exist ==> VB
. ==> .
Traditional ==> JJ
data ==> NNS
analytics ==> NNS
refers ==> VBZ
to ==> IN
the ==> DT
process ==> NN
of ==> IN
analyzing ==> VBG
massive ==> JJ
amounts ==> NNS
of ==> IN
collected ==> JJ
data ==> NNS
to ==> TO
get ==> VB
insights ==> NNS
and ==> CC
predictions ==> NNS
. ==> .
Business ==> NN
data ==> NNS
analytics ==> NNS
( ==> -LRB-
sometimes ==> RB
called ==> VBN
business ==> NN
analytics ==> NNS
) ==> -RRB-
takes ==> VBZ
that ==> DT
idea ==> NN
, ==> ,
but ==> CC
puts ==> VBZ
it ==> PRP
in ==> IN
the ==> DT
context ==> NN
of ==> IN
business ==> NN
insight ==> NN
, ==> ,
often ==> RB
with ==> IN
prebuilt ==> NN
business ==> NN
content ==> NN
and ==> CC

# Parser

Component of spacy pipeline which tries to understand the dependency of one token on another. The parser in spaCy is used to identify the relationships between words in a text and categorize them into a tree structure. This process is known as dependency parsing, and it allows for a deeper understanding of the structure and meaning of a text by indicating which words depend on others and in what way. This information can then be used for tasks such as sentiment analysis, summarization, and question answering.

In [9]:
from spacy import displacy
displacy.render(doc1,style='dep')

In [10]:
spacy.explain('nsubj')

'nominal subject'

In [11]:
spacy.explain('ccomp')

'clausal complement'

In [12]:
for token in doc1:
    print(token.text,'==>',token.dep_)

Let ==> ROOT
’s ==> nsubj
start ==> ccomp
by ==> prep
differentiating ==> pcomp
between ==> prep
data ==> compound
analytics ==> pobj
and ==> cc
traditional ==> amod
analytics ==> conj
. ==> punct
The ==> det
terms ==> nsubjpass
are ==> auxpass
often ==> advmod
used ==> ROOT
interchangeably ==> advmod
, ==> punct
but ==> cc
a ==> det
distinction ==> nsubj
does ==> aux
exist ==> conj
. ==> punct
Traditional ==> amod
data ==> compound
analytics ==> nsubj
refers ==> ROOT
to ==> prep
the ==> det
process ==> pobj
of ==> prep
analyzing ==> pcomp
massive ==> amod
amounts ==> dobj
of ==> prep
collected ==> amod
data ==> pobj
to ==> aux
get ==> advcl
insights ==> dobj
and ==> cc
predictions ==> conj
. ==> punct
Business ==> compound
data ==> compound
analytics ==> nsubj
( ==> punct
sometimes ==> advmod
called ==> advcl
business ==> compound
analytics ==> oprd
) ==> punct
takes ==> ROOT
that ==> det
idea ==> dobj
, ==> punct
but ==> cc
puts ==> conj
it ==> dobj
in ==> prep
the ==> det
context ==

## Noun-Chunks
Noun chunks are useful for extracting meaningful information from text and for identifying the key entities mentioned in a document. Iterate over the base noun phrases in the document.

In [13]:
for chunk in doc1.noun_chunks:
    print(chunk.text,'==>',chunk.label_)

’s ==> NP
data analytics ==> NP
traditional analytics ==> NP
The terms ==> NP
a distinction ==> NP
Traditional data analytics ==> NP
the process ==> NP
massive amounts ==> NP
collected data ==> NP
insights ==> NP
predictions ==> NP
Business data analytics ==> NP
business analytics ==> NP
that idea ==> NP
it ==> NP
the context ==> NP
business insight ==> NP
prebuilt business content ==> NP
tools ==> NP
that ==> NP
the analysis process ==> NP
business analytics ==> NP
historical business data ==> NP
that data ==> NP
trends ==> NP
patterns ==> NP
root causes ==> NP
data-driven business decisions ==> NP
those insights ==> NP
other words ==> NP
data analytics ==> NP
a general description ==> NP
the modern analytics process ==> NP
Business analytics ==> NP
a narrower focus ==> NP
organizations ==> NP
the globe ==> NP
the overall volume ==> NP
data ==> NP


## NER - Named Entity Recognizer
Involves identifying and categorizing named entities in a text into predefined categories such as person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc. The goal of NER is to extract structured information from unstructured text data. This information can then be used for various NLP tasks such as information retrieval, question answering, and event extraction.

In [14]:
doc1

Let’s start by differentiating between data analytics and traditional analytics. The terms are often used interchangeably, but a distinction does exist. Traditional data analytics refers to the process of analyzing massive amounts of collected data to get insights and predictions. Business data analytics (sometimes called business analytics) takes that idea, but puts it in the context of business insight, often with prebuilt business content and tools that expedite the analysis process. Specifically, business analytics refers to: Taking in and processing historical business data. Analyzing that data to identify trends, patterns, and root causes. Making data-driven business decisions based on those insights. In other words, data analytics is more of a general description of the modern analytics process. Business analytics implies a narrower focus and has functionally become more prevalent and more important for organizations around the globe as the overall volume of data has increased.

The India men's national cricket team, also known as Team India or the Men in Blue,[10] represents India in men's international cricket. It is governed by the Board of Control for Cricket in India (BCCI), and is a Full Member of the International Cricket Council (ICC) with Test, One Day International (ODI) and Twenty20 International (T20I) status. Cricket was introduced to the Indian subcontinent by British sailors in the 18th century, and the first cricket club was established in 1792. India's national cricket team played its first international match on 25 June 1932 in a Lord's Test, becoming the sixth team to be granted Test cricket status. India had to wait until 1952, almost twenty years, for its first Test victory. In its first fifty years of international cricket, success was limited, with only 35 wins in 196 Tests. The team, however, gained strength in the 1970s with the emergence of players like Sunil Gavaskar, Gundappa Viswanath, Kapil Dev, and the Indian spin quartet.

In [15]:
doc2=nlp('The India mens national cricket team, also known as Team India or the Men in Blue,[10] represents India in mens international cricket. It is governed by the Board of Control for Cricket in India (BCCI), and is a Full Member of the International Cricket Council (ICC) with Test, One Day International (ODI) and Twenty20 International (T20I) status. Cricket was introduced to the Indian subcontinent by British sailors in the 18th century, and the first cricket club was established in 1792. India national cricket team played its first international match on 25 June 1932 in a Lord Test, becoming the sixth team to be granted Test cricket status. India had to wait until 1952, almost twenty years, for its first Test victory. In its first fifty years of international cricket, success was limited, with only 35 wins in 196 Tests. The team, however, gained strength in the 1970s with the emergence of players like Sunil Gavaskar, Gundappa Viswanath, Kapil Dev, and the Indian spin quartet.')

In [16]:
for ent in doc2.ents:
    print(ent.text,'==>',ent.label_)

India ==> GPE
Team India ==> ORG
India ==> GPE
the Board of Control for Cricket ==> ORG
India ==> GPE
BCCI ==> ORG
the International Cricket Council ==> ORG
One Day ==> DATE
Twenty20 International ==> ORG
Indian ==> NORP
British ==> NORP
the 18th century ==> DATE
first ==> ORDINAL
1792 ==> DATE
India national cricket team ==> ORG
first ==> ORDINAL
25 June 1932 ==> DATE
sixth ==> ORDINAL
India ==> GPE
1952 ==> DATE
almost twenty years ==> DATE
first ==> ORDINAL
its first fifty years ==> DATE
196 ==> CARDINAL
the 1970s ==> DATE
Sunil Gavaskar ==> PERSON
Gundappa Viswanath ==> PERSON
Kapil Dev ==> PERSON
Indian ==> NORP


In [17]:
spacy.explain('NORP')

'Nationalities or religious or political groups'

# Visualising  

In [18]:
displacy.render(doc2,style='ent')  #named entities

## NER with Web Data

Beautiful Soup - Python library for pulling data out of HTML and XML files

It is used for Web Scraping

In [19]:
import requests

In [20]:
from bs4 import BeautifulSoup

In [21]:
url='https://en.wikipedia.org/wiki/India'

In [22]:
print(url)

https://en.wikipedia.org/wiki/India


In [23]:
request = requests.get(url)
print(request)

<Response [200]>


In [24]:
request=request.text
print(request)

<!DOCTYPE html>
<html class="client-nojs vector-feature-language-in-header-enabled vector-feature-language-in-main-page-header-disabled vector-feature-language-alert-in-sidebar-enabled vector-feature-sticky-header-disabled vector-feature-page-tools-disabled vector-feature-page-tools-pinned-disabled vector-feature-main-menu-pinned-disabled vector-feature-limited-width-enabled vector-feature-limited-width-content-enabled" lang="en" dir="ltr">
<head>
<meta charset="UTF-8"/>
<title>India - Wikipedia</title>
<script>document.documentElement.className="client-js vector-feature-language-in-header-enabled vector-feature-language-in-main-page-header-disabled vector-feature-language-alert-in-sidebar-enabled vector-feature-sticky-header-disabled vector-feature-page-tools-disabled vector-feature-page-tools-pinned-disabled vector-feature-main-menu-pinned-disabled vector-feature-limited-width-enabled vector-feature-limited-width-content-enabled";(function(){var cookie=document.cookie.match(/(?:^|; )

In [25]:
soup_request=BeautifulSoup(request)
print(soup_request)

<!DOCTYPE html>
<html class="client-nojs vector-feature-language-in-header-enabled vector-feature-language-in-main-page-header-disabled vector-feature-language-alert-in-sidebar-enabled vector-feature-sticky-header-disabled vector-feature-page-tools-disabled vector-feature-page-tools-pinned-disabled vector-feature-main-menu-pinned-disabled vector-feature-limited-width-enabled vector-feature-limited-width-content-enabled" dir="ltr" lang="en">
<head>
<meta charset="utf-8"/>
<title>India - Wikipedia</title>
<script>document.documentElement.className="client-js vector-feature-language-in-header-enabled vector-feature-language-in-main-page-header-disabled vector-feature-language-alert-in-sidebar-enabled vector-feature-sticky-header-disabled vector-feature-page-tools-disabled vector-feature-page-tools-pinned-disabled vector-feature-main-menu-pinned-disabled vector-feature-limited-width-enabled vector-feature-limited-width-content-enabled";(function(){var cookie=document.cookie.match(/(?:^|; )

In [26]:
text=soup_request.body.text
print(text)


Jump to content





Toggle sidebar












Search

















Create account





Personal tools



Create account
Log in




				Pages for logged out editors learn more



TalkContributions











Navigation


Main pageContentsCurrent eventsRandom articleAbout WikipediaContact usDonate




Contribute


HelpLearn to editCommunity portalRecent changesUpload file




Tools


What links hereRelated changesUpload fileSpecial pagesPermanent linkPage informationCite this pageWikidata item




Print/export


Download as PDFPrintable version




In other projects


Wikimedia CommonsWikinewsWikiquoteWikivoyage




Languages

On this Wikipedia the language links are at the top of the page across from the article title. Go to top.















Contents
move to sidebar
hide




(Top)





1Etymology







2History







2.1Ancient India







2.2Medieval India







2.3Early modern India







2.4Modern India









3Geography







4Biodiversity







5Politics and gover

In [27]:
type(text)

str

## Converting text to doc

In [28]:
doc4=nlp(text)

In [29]:
type(doc4)

spacy.tokens.doc.Doc

In [30]:
# Finding NER in the doc4
for ent in doc4.ents:
    print(ent.text,'==>',ent.label_)

Search

















Create ==> ORG
usDonate ==> ORG
HelpLearn ==> ORG
Download ==> WORK_OF_ART
PDFPrintable ==> ORG
Wikimedia CommonsWikinewsWikiquoteWikivoyage ==> PRODUCT
Wikipedia ==> PERSON
2History ==> CARDINAL
India ==> GPE
India ==> GPE
2.3Early ==> CARDINAL
India ==> GPE
India ==> GPE
5.1Politics ==> CARDINAL
5.3.1States ==> CARDINAL
5.3.2Union ==> CARDINAL
7.3Socio ==> CARDINAL
8Demographics ==> CARDINAL
11Notes ==> CARDINAL
14External ==> CARDINAL
292 ==> CARDINAL
AcèhАдыгэбзэАдыгабзэAfrikaansAlemannischአማርኛAnarâškielâÆngliscАԥсшәаالعربيةAragonésܐܪܡܝܐԱրեւմտահայերէնArmãneashtiArpetanঅসমীয়াAsturianuAtikamekwअवधीAvañe'ẽАварAymar aruAzərbaycancaتۆرکجهBasa ==> PRODUCT
CentralBislamaБългарскиBoarischབོད་ཡིགBosanskiBrezhonegБуряадCatalàЧӑвашлаCebuanoČeštinaChamoruChavacano de ZamboangaChi-ChewaChiShonaChiTumbukaCorsuCymraegDagbanliDanskالدارجةDavvisámegiellaDeitschDeutschދިވެހިބަސްDiné bizaadDolnoserbskiडोटेलीཇོང་ཁEestiΕλληνικάЭрзяньEspañolEsperantoEstremeñuEuskaraEʋegbeفارسیFi

British ==> NORP
the East India Company ==> ORG
India ==> GPE
the Republic of India ==> GPE
India ==> GPE
between 1848 and 1885 ==> DATE
1848 ==> DATE
Dalhousie ==> PERSON
the East India Company ==> ORG
telegraph ==> ORG
Indian ==> NORP
1857 ==> DATE
Fed ==> ORG
British ==> NORP
India ==> GPE
1858 ==> DATE
the East India Company ==> ORG
India ==> GPE
British ==> NORP
British ==> NORP
the decades ==> DATE
India ==> GPE
the Indian National Congress ==> ORG
1885.[146][147][148][149 ==> CARDINAL
the second half of the 19th century ==> DATE
Indian ==> NORP
Punjab ==> ORG
Indian ==> NORP
1909 ==> DATE
British ==> NORP
Indian ==> NORP
Mahatma Gandhi ==> PERSON
Mumbai ==> GPE
6 July 1946 ==> DATE
World War I ==> EVENT
approximately one million ==> CARDINAL
Indians ==> NORP
British ==> NORP
Indian ==> NORP
Mahatma Gandhi ==> PERSON
the 1930s ==> DATE
British ==> NORP
the Indian National Congress ==> ORG
Indian ==> NORP
World War II ==> EVENT
Congress ==> ORG
Muslim ==> NORP
1947 ==> DATE
India 

1768.[271 ==> CARDINAL
China ==> GPE
1964 ==> DATE
Pakistan ==> GPE
1965 ==> DATE
India ==> GPE
India ==> GPE
first ==> ORDINAL
1974 ==> DATE
1998 ==> DATE
India ==> GPE
the Nuclear Non-Proliferation Treaty ==> LAW
discriminatory.[273 ==> GPE
India ==> GPE
first ==> ORDINAL
Minimum Credible Deterrence ==> WORK_OF_ART
fifth ==> ORDINAL
the Cold War ==> EVENT
India ==> GPE
the United States ==> GPE
the European Union.[279] ==> ORG
India ==> GPE
the United States ==> GPE
India ==> GPE
the Nuclear Non-Proliferation Treaty ==> ORG
the International Atomic Energy Agency ==> ORG
the Nuclear Suppliers Group ==> ORG
India ==> GPE
India ==> GPE
sixth ==> ORDINAL
state.[280 ==> NORP
India ==> GPE
Canada.[284 ==> GPE
Narendra Modi ==> PERSON
India ==> GPE
Enrique Peña ==> PERSON
Mexico ==> GPE
Mexico ==> GPE
2016 ==> DATE
India ==> GPE
1.45 million ==> MONEY
second ==> ORDINAL
the Indian Army ==> ORG
the Indian Navy ==> ORG
the Indian Air Force ==> ORG
the Indian Coast ==> ORG
Indian ==> NORP
2011

Krishna Killing the Horse Demon Keshi ==> ORG
5th century






Elephanta Caves ==> DATE
Shiva ==> ORG
18 feet ==> QUANTITY
5.5 ==> CARDINAL
550 ==> CARDINAL
Chola ==> ORG
Nataraja ==> ORG
Dance ==> ORG
Tamil Nadu ==> PERSON
10th or ==> DATE
11th century ==> DATE
Mewar ==> GPE
Balchand ==> GPE
Milkmaids ==> ORG
Kangra ==> ORG
1775–1785 ==> CARDINAL
India ==> GPE
The Taj Mahal ==> ORG
Yamuna ==> LOC
two ==> CARDINAL
Indian ==> NORP
the Taj Mahal ==> ORG
Indo-Islamic Mughal ==> ORG
South Indian ==> NORP
Vastu shastra ==> PERSON
Mamuni Mayan,[402 ==> PERSON
constructs.[404 ==> ORG
Hindu ==> NORP
The Taj Mahal ==> ORG
between 1631 and 1648 ==> DATE
Shah Jahan ==> PERSON
Muslim ==> NORP
India ==> GPE
one ==> CARDINAL
British ==> NORP
the late 19th century ==> DATE
Indo-Islamic ==> ORG
Indian ==> NORP
India ==> GPE
Rigveda ==> ORG
Mahābhārata ==> PERSON
c.  ==> PERSON
Ramayana ==> PERSON
c. ==> NORP
Abhijñānaśākuntalam ==> ORG
Recognition ==> ORG
Kālidāsa ==> GPE
c. ==> NORP
Mahākāvya ==> GP

2005 ==> DATE
National Portal of India ==> ORG
4 February 2017 ==> DATE
1 March 2017 ==> DATE
The National Anthem of India ==> ORG
Jana Gana Mana ==> PERSON
Bengali ==> NORP
Rabindranath Tagore ==> GPE
Hindi ==> GPE
the Constituent Assembly ==> ORG
the National Anthem of ==> ORG
India ==> GPE
24 January 1950 ==> DATE
India ==> GPE
Gana Mana' ==> PERSON
14 August 2012 ==> DATE
17 April 2019 ==> DATE
7 June 2019 ==> DATE
Wolpert 2003 ==> PERSON
1 ==> CARDINAL
Constituent Assembly of India ==> ORG
1950 ==> DATE
Ministry of Home Affairs ==> ORG
1960 ==> DATE
National Portal of India ==> ORG
30 August 2013 ==> DATE
23 August 2013 ==> DATE
Constitutional Provisions – Official Language Related Part-17 of the Constitution of India ==> WORK_OF_ART
India ==> GPE
18 April 2021 ==> DATE
18 April 2021 ==> DATE
25 January 2010 ==> DATE
India ==> GPE
Gujarat High Court ==> ORG
The Times of India ==> ORG
18 March 2014 ==> DATE
5 May 2014 ==> DATE
Learning with the Times: India ==> WORK_OF_ART
The Time

National Informatics Centre ==> ORG
17 January 2022 ==> DATE
Karanth & Gopal 2005 ==> ORG
374 ==> CARDINAL
India ==> GPE
Oxford English Dictionary ==> ORG
3rd ed. ==> DATE
2009 ==> DATE
Thieme 1970 ==> EVENT
pp ==> GPE
447–450 ==> CARDINAL
Kuiper 2010 ==> ORG
86 ==> CARDINAL
Clémentin-Ojha ==> ORG
2014 ==> DATE
Constitution ==> LAW
India ==> GPE
PDF ==> ORG
Ministry of Law and Justice ==> ORG
1 December 2007 ==> DATE
PDF ==> ORG
9 September 2014 ==> DATE
3 March 2012 ==> DATE
Article 1(1 ==> LAW
India ==> GPE
Bharat ==> ORG
Dwijendra Narayan ==> PERSON
2014 ==> DATE
Rethinking Hindu Identity ==> ORG
Routledge ==> GPE
ISBN ==> ORG
978 ==> CARDINAL
2017 ==> DATE
253 ==> CARDINAL
2003 ==> DATE
Hindustan ==> ORG
Encyclopædia Britannica ==> PERSON
17 July 2011 ==> DATE
Coningham & Young 2015 ==> ORG
104–105 ==> CARDINAL
Kulke & Rothermund 2004 ==> ORG
21–23 ==> CARDINAL
Singh 2009 ==> ORG
181 ==> CARDINAL
Possehl 2003 ==> ORG
Singh 2009 ==> GPE
255 ==> CARDINAL
2009 ==> DATE
186–187 ==> CAR

358 ==> CARDINAL
ISBN ==> ORG
978 ==> CARDINAL
Ficus religiosa ==> ORG
Crame & Owen 2002 ==> ORG
142 ==> CARDINAL
Karanth 2006 ==> EVENT
Singh ==> GPE
M. ==> ORG
Kumar, A. & Molur ==> ORG
S. (2008 ==> ORG
IUCN ==> ORG
2008 ==> DATE
T44694A10927987 ==> ORG
doi:10.2305/IUCN.UK.2008.RLTS.T44694A10927987.en ==> ORG
Johann ==> ORG
Semnopithecus ==> ORG
29 August 2018 ==> DATE
27 August 2018 ==> DATE
S.D. Biju ==> PERSON
Sushil Dutta ==> GPE
M.S. Ravichandran Karthikeyan Vasudevan ==> ORG
S.P. Vijayakumar ==> PERSON
Chelmala Srinivasulu ==> PERSON
Gajanan Dasaramji Bhuddhe ==> PERSON
2004 ==> DATE
IUCN ==> ORG
IUCN ==> ORG
2004 ==> DATE
Frost ==> ORG
Darrel R. ==> PERSON
2015 ==> DATE
1876 ==> DATE
Amphibian Species ==> PERSON
6.0 ==> CARDINAL
American Museum of Natural History ==> ORG
21 July 2015 ==> DATE
13 September 2015 ==> DATE
Mace ==> PERSON
1994 ==> DATE
Lovette ==> PERSON
Irby J. ==> PERSON
Fitzpatrick ==> ORG
John W. ==> PERSON
2016 ==> DATE
John Wiley & Sons ==> ORG
599 ==> MONEY

India ==> GPE
the UK Trade and Investment 2011 ==> ORG
2011 ==> DATE
India ==> GPE
Differding.com ==> ORG
24 June 2013 ==> DATE
23 February 2014 ==> DATE
4 April 2014 ==> DATE
India ==> GPE
Total Power Generation Capacity Crosses ==> ORG
300 ==> CARDINAL
1 August 2016 ==> DATE
16 June 2017 ==> DATE
17 October 2021 ==> DATE
Rowlatt ==> ORG
Justin ==> ORG
12 May 2020 ==> DATE
India ==> GPE
first ==> ORDINAL
four decades ==> DATE
BBC News ==> ORG
3 December 2020 ==> DATE
USAID ==> ORG
September 2018 ==> DATE
Greenhouse Gas Emissions ==> WORK_OF_ART
India ==> GPE
PDF ==> ORG
10 June 2021 ==> DATE
UN Environment Programme ==> ORG
2019 ==> DATE
Emissions Gap Report ==> WORK_OF_ART
2019 ==> DATE
10 June 2021 ==> DATE
India 2020 – Analysis ==> WORK_OF_ART
International Energy Agency ==> ORG
3 December 2020 ==> DATE
Chan ==> PERSON
Margaret ==> PERSON
11 February 2014 ==> DATE
India ==> GPE
New Delhi ==> GPE
India ==> GPE
World Health Organization ==> ORG
17 ==> CARDINAL
October 2021 ==> DATE
I

2002 ==> DATE
Evolution of Indian Costume ==> WORK_OF_ART
Central Asia ==> LOC
India ==> GPE
Rahman ==> PERSON
India ==> GPE
China ==> GPE
Central ==> GPE
West Asia ==> LOC
Oxford University Press ==> ORG
ISBN ==> ORG
978 ==> CARDINAL
2011 ==> DATE
Concise Oxford ==> PERSON
English ==> LANGUAGE
Oxford University Press ==> ORG
1272 ==> DATE
ISBN ==> ORG
978 ==> CARDINAL
3 ==> CARDINAL
September 2019

^ Stevenson ==> DATE
2011 ==> DATE
Concise Oxford ==> PERSON
English ==> LANGUAGE
Oxford University Press ==> ORG
774 ==> MONEY
ISBN ==> ORG
978 ==> CARDINAL
John T. ==> PERSON
John Thompson ==> PERSON
1884 ==> DATE
Urdu ==> GPE
Hindi ==> GPE
English ==> LANGUAGE
London ==> GPE
W. H. Allen & Co. ==> PERSON
418 ==> CARDINAL
24 February 2021 ==> DATE
26 ==> CARDINAL
August 2019 ==> DATE
February 2015 ==> DATE
Shukla, Pravina ==> ORG
2015 ==> DATE
The Grace of Four Moons ==> ORG
Modern India ==> LOC
Indiana University Press ==> ORG
ISBN ==> ORG
978 ==> CARDINAL
Rachel ==> PERSON
2014 ==> DATE


2 ==> CARDINAL
413–496 ==> CARDINAL
Stein ==> GPE
1998 ==> DATE
India ==> GPE
1st ed. ==> DATE
Oxford ==> ORG
Wiley-Blackwell ==> ORG
978 ==> CARDINAL
2010 ==> DATE
India ==> GPE
2nd ed. ==> DATE
Oxford ==> ORG
Wiley-Blackwell ==> ORG
978 ==> CARDINAL
Michael ==> PERSON
2003 ==> DATE
Upanișads ==> NORP
Gavin D. Flood ==> ORG
Blackwell ==> PERSON
John Wiley & Sons ==> ORG
ISBN ==> ORG
978 ==> CARDINAL
15 March 2012 ==> DATE
Wolpert, S. ( ==> PERSON
2003 ==> DATE
India ==> GPE
7th ed. ==> DATE
Oxford University Press ==> ORG
ISBN ==> ORG
978 ==> CARDINAL
Ali ==> PERSON
J. R. ==> PERSON
Aitchison ==> ORG
J. C. ( ==> PERSON
2005 ==> DATE
Greater India ==> WORK_OF_ART
Earth-Science Reviews ==> ORG
72 ==> DATE
170–173 ==> CARDINAL
Bibcode:2005ESRv... ==> ORG
72 ==> CARDINAL
doi:10.1016/j.earscirev.2005.07.005
Basu ==> ORG
Mahua ==> PERSON
Xavier ==> ORG
Savarimuthu ==> GPE
2017 ==> DATE
Fundamentals of Environmental Studies ==> ORG
Cambridge University Press ==> ORG
ISBN ==> ORG
978 ==> CARD

2 May 2011 ==> DATE
6 ==> CARDINAL
July 2011 ==> DATE
V. K. ==> PERSON
2007 ==> DATE
India ==> GPE
PDF ==> ORG
PDF ==> ORG
27 September 2007 ==> DATE
June 2007 ==> DATE
Rajat ==> ORG
27 July 2009 ==> DATE
India ==> GPE
The Times of India ==> ORG
11 August 2011 ==> DATE
March 2010 ==> DATE
Rajat ==> ORG
8 January 2015 ==> DATE
5th ==> ORDINAL
The Times of India ==> ORG
11 March 2015 ==> DATE
17 ==> CARDINAL
October 2021 ==> DATE
Rajat ==> ORG
16 March 2021 ==> DATE
India ==> GPE
33% ==> PERCENT
last five years ==> DATE
second ==> ORDINAL
The Times of India ==> ORG
3 February 2022 ==> DATE
Rajat ==> ORG
1 February 2022 ==> DATE
The Times of India ==> ORG
3 February 2022 ==> DATE
G. ==> PERSON
5 November 2001 ==> DATE
India ==> GPE
Nuclear Bomb ==> ORG
University of California Press ==> ORG
ISBN ==> ORG
978 ==> CARDINAL
22 ==> CARDINAL
July 2011 ==> DATE
India ==> GPE
France ==> GPE
Civil Nuclear Cooperation, Rediff ==> ORG
25 January 2008 ==> DATE
22 ==> CARDINAL
August 2010 ==> DATE
UK 

Bandyopadhyay ==> NORP
2006 ==> DATE
Indian ==> NORP
Routledge ==> GPE
978 ==> CARDINAL
E. M. ==> PERSON
2007 ==> DATE
American ==> NORP
Guide to Doing Business ==> ORG
India ==> GPE
Adams ==> ORG
978 ==> CARDINAL
Massey ==> GPE
R. ==> NORP
Massey ==> PERSON
1998 ==> DATE
The Music of India ==> ORG
Abhinav Publications ==> ORG
ISBN ==> ORG
978 ==> CARDINAL
Medora ==> GPE
N. ==> PERSON
2003 ==> DATE
Mate Selection ==> WORK_OF_ART
Contemporary India ==> ORG
Versus Arranged Marriages ==> PERSON
Hamon ==> GPE
R. R. ==> PERSON
Ingoldsby ==> GPE
B. B. ==> PERSON
pp ==> GPE
ISBN ==> ORG
978 ==> CARDINAL
Nalin ==> PERSON
30 July 2008 ==> DATE
India ==> GPE
Satellites ==> ORG
Politics and Cultural Change ==> ORG
Taylor & Francis US ==> ORG
ISBN ==> ORG
978 ==> CARDINAL
12 ==> CARDINAL
September 2012 ==> DATE
Narayan ==> GPE
Sunetra Sen ==> PERSON
2015 ==> DATE
Globalization and Television ==> ORG
1990–2010 ==> DATE
Oxford University Press ==> ORG
ISBN ==> ORG
978 ==> CARDINAL
24 September 2010 

1985 ==> DATE
1986 ==> DATE
Kathmandu ==> ORG
1987 ==> DATE
Islamabad ==> GPE
1988 ==> DATE
1990 ==> DATE
1991 ==> DATE
Dhaka ==> ORG
1993 ==> DATE
New Delhi ==> GPE
1995 ==> DATE
1997 ==> DATE
1998 ==> DATE
Kathmandu 2002 ==> ORG
Islamabad 2004 ==> ORG
Dhaka ==> ORG
2005 ==> DATE
New Delhi ==> GPE
2007 ==> DATE
2011 ==> DATE
Kathmandu 2014 ==> ORG
Islamabad 2016 ==> EVENT
Afghanistan ==> GPE
Bhutan ==> GPE
India ==> GPE
Maldives
Nepal ==> ORG
Pakistan ==> GPE
Sri Lanka ==> GPE
Australia ==> GPE
China ==> GPE
European Union ==> ORG
Iran ==> GPE
Japan ==> GPE
Myanmar ==> GPE
South Korea ==> GPE
United States ==> GPE
South Africa ==> GPE
Russia ==> GPE
South Asia Co-operative ==> LOC
Chamber of Commerce and Industry ==> ORG
SAARC Secretariat
SAARC ==> PRODUCT
South Asian University ==> ORG
South Asian ==> NORP
Eight ==> CARDINAL
Group of ==> ORG
5)G8 ==> CARDINAL
Canada ==> GPE
France ==> GPE
Germany ==> GPE
Italy ==> GPE
Japan ==> GPE
Russia ==> GPE
United Kingdom ==> GPE
United States 

In [31]:
displacy.render(doc4,style='ent')  #named entities

In [32]:
print(len(doc4.ents))

6157


In [33]:
print(len(doc4))

41133


## List of Entities

In [34]:
ent_list=[]
for ent in doc4.ents:
    ent_list.append(ent.text)
print(ent_list)

['Search\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nCreate', 'usDonate', 'HelpLearn', 'Download', 'PDFPrintable', 'Wikimedia CommonsWikinewsWikiquoteWikivoyage', 'Wikipedia', '2History', 'India', 'India', '2.3Early', 'India', 'India', '5.1Politics', '5.3.1States', '5.3.2Union', '7.3Socio', '8Demographics', '11Notes', '14External', '292', "AcèhАдыгэбзэАдыгабзэAfrikaansAlemannischአማርኛAnarâškielâÆngliscАԥсшәаالعربيةAragonésܐܪܡܝܐԱրեւմտահայերէնArmãneashtiArpetanঅসমীয়াAsturianuAtikamekwअवधीAvañe'ẽАварAymar aruAzərbaycancaتۆرکجهBasa", 'CentralBislamaБългарскиBoarischབོད་ཡིགBosanskiBrezhonegБуряадCatalàЧӑвашлаCebuanoČeštinaChamoruChavacano de ZamboangaChi-ChewaChiShonaChiTumbukaCorsuCymraegDagbanliDanskالدارجةDavvisámegiellaDeitschDeutschދިވެހިބަސްDiné bizaadDolnoserbskiडोटेलीཇོང་ཁEestiΕλληνικάЭрзяньEspañolEsperantoEstremeñuEuskaraEʋegbeفارسیFiji HindiFøroysktFrançaisFryskFulfuldeFurlanGaeilgeGaelgGagauzGàidhligGalegoГӀалгӀай贛語Gĩkũyũگیلکیગુજરાતી𐌲𐌿𐍄𐌹𐍃𐌺गोंयची कोंकणी', 'Gõychi Konknni客家語/Hak-kâ-ngîХальм

## List of type of Entities

In [35]:
ent_type_list=[]
for ent in doc4.ents:
    ent_type_list.append(ent.label_)
print(ent_type_list)

['ORG', 'ORG', 'ORG', 'WORK_OF_ART', 'ORG', 'PRODUCT', 'PERSON', 'CARDINAL', 'GPE', 'GPE', 'CARDINAL', 'GPE', 'GPE', 'CARDINAL', 'CARDINAL', 'CARDINAL', 'CARDINAL', 'CARDINAL', 'CARDINAL', 'CARDINAL', 'CARDINAL', 'PRODUCT', 'ORG', 'PERSON', 'ORG', 'GPE', 'ORG', 'PERSON', 'PERSON', 'GPE', 'PERSON', 'ORG', 'ORG', 'ORG', 'LOC', 'GPE', 'GPE', 'ORG', 'WORK_OF_ART', 'ORG', 'WORK_OF_ART', 'ORG', 'GPE', 'ORG', 'CARDINAL', 'CARDINAL', 'ORG', 'CARDINAL', 'DATE', 'CARDINAL', 'GPE', 'GPE', 'PERCENT', 'PERCENT', 'PERSON', 'PERSON', 'PERSON', 'PERSON', 'PERSON', 'ORG', 'ORG', 'ORG', 'ORG', 'DATE', 'PERCENT', 'DATE', 'QUANTITY', 'MONEY', 'DATE', 'MONEY', 'QUANTITY', 'ORG', 'MONEY', 'MONEY', 'QUANTITY', 'ORG', 'MONEY', 'MONEY', 'CARDINAL', 'DATE', 'CARDINAL', 'PERSON', 'ORG', 'GPE', 'GPE', 'LOC', 'ORDINAL', 'ORDINAL', 'LOC', 'LOC', 'LOC', 'GPE', 'NORP', 'GPE', 'GPE', 'GPE', 'GPE', 'GPE', 'LOC', 'GPE', 'GPE', 'ORG', 'PRODUCT', 'GPE', 'GPE', 'GPE', 'GPE', 'NORP', 'LOC', 'DATE', 'ORDINAL', 'LOC', 'NORP',

In [36]:
from collections import Counter
Counter(ent_list)
# Count the occurrences of elements in a list-like object.

Counter({'Search\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nCreate': 1,
         'usDonate': 1,
         'HelpLearn': 1,
         'Download': 1,
         'PDFPrintable': 1,
         'Wikimedia CommonsWikinewsWikiquoteWikivoyage': 1,
         'Wikipedia': 7,
         '2History': 1,
         'India': 497,
         '2.3Early': 1,
         '5.1Politics': 1,
         '5.3.1States': 1,
         '5.3.2Union': 1,
         '7.3Socio': 1,
         '8Demographics': 1,
         '11Notes': 1,
         '14External': 1,
         '292': 1,
         "AcèhАдыгэбзэАдыгабзэAfrikaansAlemannischአማርኛAnarâškielâÆngliscАԥсшәаالعربيةAragonésܐܪܡܝܐԱրեւմտահայերէնArmãneashtiArpetanঅসমীয়াAsturianuAtikamekwअवधीAvañe'ẽАварAymar aruAzərbaycancaتۆرکجهBasa": 1,
         'CentralBislamaБългарскиBoarischབོད་ཡིགBosanskiBrezhonegБуряадCatalàЧӑвашлаCebuanoČeštinaChamoruChavacano de ZamboangaChi-ChewaChiShonaChiTumbukaCorsuCymraegDagbanliDanskالدارجةDavvisámegiellaDeitschDeutschދިވެހިބަސްDiné bizaadDolnoserbskiडोटेलीཇོང་ཁEestiΕλληνικ

### Entity most appeared

In [37]:
most_ent=[]
for ent in doc4.ents:
    most_ent.append(ent.text)
print(most_ent)

['Search\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nCreate', 'usDonate', 'HelpLearn', 'Download', 'PDFPrintable', 'Wikimedia CommonsWikinewsWikiquoteWikivoyage', 'Wikipedia', '2History', 'India', 'India', '2.3Early', 'India', 'India', '5.1Politics', '5.3.1States', '5.3.2Union', '7.3Socio', '8Demographics', '11Notes', '14External', '292', "AcèhАдыгэбзэАдыгабзэAfrikaansAlemannischአማርኛAnarâškielâÆngliscАԥсшәаالعربيةAragonésܐܪܡܝܐԱրեւմտահայերէնArmãneashtiArpetanঅসমীয়াAsturianuAtikamekwअवधीAvañe'ẽАварAymar aruAzərbaycancaتۆرکجهBasa", 'CentralBislamaБългарскиBoarischབོད་ཡིགBosanskiBrezhonegБуряадCatalàЧӑвашлаCebuanoČeštinaChamoruChavacano de ZamboangaChi-ChewaChiShonaChiTumbukaCorsuCymraegDagbanliDanskالدارجةDavvisámegiellaDeitschDeutschދިވެހިބަސްDiné bizaadDolnoserbskiडोटेलीཇོང་ཁEestiΕλληνικάЭрзяньEspañolEsperantoEstremeñuEuskaraEʋegbeفارسیFiji HindiFøroysktFrançaisFryskFulfuldeFurlanGaeilgeGaelgGagauzGàidhligGalegoГӀалгӀай贛語Gĩkũyũگیلکیગુજરાતી𐌲𐌿𐍄𐌹𐍃𐌺गोंयची कोंकणी', 'Gõychi Konknni客家語/Hak-kâ-ngîХальм

In [38]:
Counter(ent_list).most_common()

[('India', 497),
 ('978', 158),
 ('ISBN', 141),
 ('Indian', 140),
 ('PDF', 60),
 ('pp', 47),
 ('2011', 33),
 ('Pakistan', 29),
 ('2018', 29),
 ('China', 28),
 ('1998', 27),
 ('Oxford University Press', 27),
 ('British', 26),
 ('first', 26),
 ('2007', 26),
 ('Metcalf & Metcalf 2006', 25),
 ('Hindu', 24),
 ('2009', 24),
 ('two', 22),
 ('2006', 22),
 ('2014', 21),
 ('1', 21),
 ('South Asia', 19),
 ('2001', 18),
 ('July 2011', 17),
 ('second', 16),
 ('2010', 15),
 ('Cambridge University Press', 15),
 ('Russia', 14),
 ('2003', 14),
 ('Routledge', 14),
 ('1997', 14),
 ('1994', 14),
 ('23', 14),
 ('Punjab', 13),
 ('Kashmir', 13),
 ('one', 13),
 ('Congress', 13),
 ('2019', 13),
 ('The Times of India', 13),
 ('26', 12),
 ('22', 12),
 ('2017', 12),
 ('Sharma', 12),
 ('1970', 12),
 ('2004', 11),
 ('18 October 2021', 11),
 ('Delhi', 10),
 ('Muslim', 10),
 ('four', 10),
 ('Buddhist', 10),
 ('Gujarat', 10),
 ('17', 10),
 ('New Delhi', 10),
 ('2005', 10),
 ('Asher & Talbot 2008', 10),
 ('Harle', 10),

In [39]:
Counter(ent_list).most_common(5)

[('India', 497), ('978', 158), ('ISBN', 141), ('Indian', 140), ('PDF', 60)]

# Homework

Scrap Data from https://en.wikipedia.org/wiki/Cinema_of_India

Find the most often used named entities.

In [40]:
url1='https://en.wikipedia.org/wiki/Cinema_of_India'

In [41]:
print(url1)

https://en.wikipedia.org/wiki/Cinema_of_India


In [42]:
request1 = requests.get(url1)
print(request1)

<Response [200]>


In [43]:
print(request1.text)

<!DOCTYPE html>
<html class="client-nojs vector-feature-language-in-header-enabled vector-feature-language-in-main-page-header-disabled vector-feature-language-alert-in-sidebar-enabled vector-feature-sticky-header-disabled vector-feature-page-tools-disabled vector-feature-page-tools-pinned-disabled vector-feature-main-menu-pinned-disabled vector-feature-limited-width-enabled vector-feature-limited-width-content-enabled" lang="en" dir="ltr">
<head>
<meta charset="UTF-8"/>
<title>Cinema of India - Wikipedia</title>
<script>document.documentElement.className="client-js vector-feature-language-in-header-enabled vector-feature-language-in-main-page-header-disabled vector-feature-language-alert-in-sidebar-enabled vector-feature-sticky-header-disabled vector-feature-page-tools-disabled vector-feature-page-tools-pinned-disabled vector-feature-main-menu-pinned-disabled vector-feature-limited-width-enabled vector-feature-limited-width-content-enabled";(function(){var cookie=document.cookie.match

In [44]:
soup_requests1=BeautifulSoup(request1.text)
print(soup_requests1)

<!DOCTYPE html>
<html class="client-nojs vector-feature-language-in-header-enabled vector-feature-language-in-main-page-header-disabled vector-feature-language-alert-in-sidebar-enabled vector-feature-sticky-header-disabled vector-feature-page-tools-disabled vector-feature-page-tools-pinned-disabled vector-feature-main-menu-pinned-disabled vector-feature-limited-width-enabled vector-feature-limited-width-content-enabled" dir="ltr" lang="en">
<head>
<meta charset="utf-8"/>
<title>Cinema of India - Wikipedia</title>
<script>document.documentElement.className="client-js vector-feature-language-in-header-enabled vector-feature-language-in-main-page-header-disabled vector-feature-language-alert-in-sidebar-enabled vector-feature-sticky-header-disabled vector-feature-page-tools-disabled vector-feature-page-tools-pinned-disabled vector-feature-main-menu-pinned-disabled vector-feature-limited-width-enabled vector-feature-limited-width-content-enabled";(function(){var cookie=document.cookie.match

In [45]:
text1=soup_requests1.body.text
print(text1)


Jump to content





Toggle sidebar












Search

















Create account





Personal tools



Create account
Log in




				Pages for logged out editors learn more



TalkContributions











Navigation


Main pageContentsCurrent eventsRandom articleAbout WikipediaContact usDonate




Contribute


HelpLearn to editCommunity portalRecent changesUpload file




Tools


What links hereRelated changesUpload fileSpecial pagesPermanent linkPage informationCite this pageWikidata item




Print/export


Download as PDFPrintable version




In other projects


Wikimedia Commons




Languages

On this Wikipedia the language links are at the top of the page across from the article title. Go to top.















Contents
move to sidebar
hide




(Top)





1History







1.1Silent films (1890s–1920s)







1.2Talkies (1930s–mid-1940s)







1.3Golden Age (late 1940s–1960s)







1.41970s–present





1.4.1Hindi







1.4.2Telugu







1.4.3Tamil







1.4.4Malayalam



In [46]:
type(text1)

str

In [47]:
doc5=nlp(text1)
doc5


Jump to content





Toggle sidebar












Search

















Create account





Personal tools



Create account
Log in




				Pages for logged out editors learn more



TalkContributions











Navigation


Main pageContentsCurrent eventsRandom articleAbout WikipediaContact usDonate




Contribute


HelpLearn to editCommunity portalRecent changesUpload file




Tools


What links hereRelated changesUpload fileSpecial pagesPermanent linkPage informationCite this pageWikidata item




Print/export


Download as PDFPrintable version




In other projects


Wikimedia Commons




Languages

On this Wikipedia the language links are at the top of the page across from the article title. Go to top.















Contents
move to sidebar
hide




(Top)





1History







1.1Silent films (1890s–1920s)







1.2Talkies (1930s–mid-1940s)







1.3Golden Age (late 1940s–1960s)







1.41970s–present





1.4.1Hindi







1.4.2Telugu







1.4.3Tamil







1.4.4Malayalam



In [48]:
most_ent_cinema=[]
for ent in doc5.ents:
    most_ent_cinema.append(ent.text)
print(most_ent_cinema)

['Search\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nCreate', 'usDonate', 'HelpLearn', 'Download', 'PDFPrintable', 'Wikimedia Commons', 'Wikipedia', '1930s–mid-1940s', '1.4.2Telugu', '1.4.4Malayalam', '1.4.5Kannada', '2Cultural', '4.4Pan', 'India', '5Music', 'Bhasha', '8.4Bhojpuri', '8.10Kannada', '8.15Marathi', '8.19Punjabi', '8.22Tamil', '12Explanatory', '15External', 'Cinema of India\n\n\n\n', '46', 'Gõychi Konknni한국어हिन्दीBahasa IndonesiaItalianoಕನ್ನಡमैथिलीമലയാളംमराठीBahasa MelayuNederlands日本語Norsk', 'Wikipedia', 'Cinema of IndiaNo', '2022)[1', '2015)[2]Produced', '2016)[4]Total2,020,000,000\xa0•\xa0', 'Gross', '2019)[6]Total₹190 billion', '2.56', 'India', 'Society\nIndians', 'Folklore', 'Languages\nHolidays\nReligion\n\nArts', 'Cinema\nDance\nFestivals', 'Radio\nTelevision', 'State Emblem', 'India', 'Ministry of Culture\nMinistry of Tourism', 'India', 'India', 'India', 'Indian', 'India', 'the late 20th', 'Mumbai', 'Chennai, Hyderabad', 'Visakhapatnam', 'Kochi', 'Kolkata', 'Bangalore', 'Bhu

In [49]:
Counter(most_ent_cinema).most_common(5)

[('Indian', 131),
 ('India', 125),
 ('first', 55),
 ('Hindu', 42),
 ('Bollywood', 28)]